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THE PROBLEM 


For the past fifteen or twenty years studies of the new type 
examination—the so-called objective test or examination—have 
abounded in educational literature. This type of examination, 
because of the many advantages which the various studies have found 
in it as compared with the essay type of examination, has supplanted 
the latter more or less indiscriminately in the school situation. That 
the merits of the new type of examination are true ones cannot be 
denied. This is shown in such summaries of the field as those made by 
Kinney and Eurich* and Lee and Symonds.® Kinney and Eurich in 
the closing paragraphs of their article suggest an experiment which 
seems to be fundamental in an evaluation of the two types of tests. 
They say: 


For other questions, however, experimental techniques are available. This 
would be the case in the investigation of the possible relation between study habits 
of pupils and types of questions used in examinations. It has been suggested 
frequently that the use of the subjective examination stimulates the pupil to study 
in order to acquire an organized body of information, and observe the relationships 
and implications of the facts thus learned. On the other hand, the assertion has 
been made that pupils expecting to be tested with an objective examination are 
more apt to memorize unrelated facts without a consideration of their interrela- 
tionships. If study habits are in fact related to the type of test question used, 
it is a matter of fundamental importance that should be taken into consideration 
in deciding upon the type of question to be used in examinations. 





* Abstract of a dissertation submitted in partial fulfillment of the requirements 
for the degree of Doctor of Philosophy in the University of Michigan. 
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That study habits are related to the types of test questions has 
been shown by several investigations. Crawford! has published a list 
of aids to be used in studying for the various types of objective exami- 
nations. This list was compiled from the reports made by thirty 
graduate students on how they studied for objective examinations. 
Some of the aids listed were: Studying by extensive rather than by 
intensive methods; reviewing for the test by skimming over the high 
points; watching for key words, dates, proper names, and other 
highly concrete, factual or objective items; studying the individual 
instructor’s peculiarities and testing habits; making up imaginary 
test items beforehand; studying any available test questions on the 
subject; and reading about the theory of objective test construction. 
Besides the foregoing, specific aids were listed for specific types of 
objective examinations. 

Terry® has demonstrated that study habits are related to the 
type of question used in examinations. In an investigation of how 
college students study for essay and objective tests he found that 
when they review for the former “they look for the main points and 
endeavor to strengthen their grasp of the subject-matter in large 
units taken as wholes.”’ In reviewing for the latter, on the contrary, 
they look for details and work with small units. 

In another investigation Terry® attempted to discover whether 
college students review for the listing-recall type of examination in 
the same manner as they review for completion or for true-false tests, 
and if not, what methods of study are adapted to each type. On the 
basis of the students’ reports lists of the twenty methods best adapted 
to listing-recall, completion, and true-false tests were made. For the 
listing-recall test these methods dealt with large units of subject- 
matter such as chapters and outlines which required earnest, persistent 
and well organized thought. For the true-false test the methods were 
those which dealt primarily with facts and definitions and with the 
authors and findings of experiments. For the completion test only 
one method predominated and it emphasized word for word mastery 
of important ideas. The author concludes: 


The kind of test to be given, if the students know it in advance, determines in 
large measure both what and how they study. The behavior of students in this 
habitual way places greater powers in the teacher’s hands than many realize. By 
the selection of suitable types of tests, the teacher can cause large numbers of his 
students to study, to a considerable extent at least, in the ways he deems best for a 
given unit of subject-matter. Whether he interests himself in the question or not, 
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most of his students will probably use the methods of study which they consider 
best adapted to the particular types of tests customarily employed. 


The author himself remarks that the data in these studies are not 
concerned with the methods which students employ in studying new 
material although it is certainly not unlikely that these methods also 
are affected by the nature of the tests which the instructor gives. 

Douglass and Talmadge’ have also studied the methods of preparing 
for objective examinations at the college level. They find that, in 
preparing for the new type of examination, students learn tables and 
minute details, learn the words of the book, and pay little attention 
to formulating a personal opinion. On the other hand, students who 
are preparing for an essay test think about the content, formulate a 
persona! opinion, read and review generalities and trends, try to 
understand underlying relationships, attempt to draw important 
conclusions from tables, and read their notes on text and lecture with- 
out picking out details. 

For the past five years the writer has felt that the most important 
factor in evaluating the various types of examinations may be the 
mental set produced by studying for any one type of test. This 
may determine the study habits of the individual and hence influence 
to a high degree both the amount of material that he will learn and 
the amount he will retain. Some writers seem to appreciate this issue 
in part. For example, Ruch’ in his statement of the limitations of 
objective examinations says that the objective test measures recogni- 
tion rather than recall. However, he does not believe that this means 
that the older type of examination is superior for he holds that ‘‘we 
do not know what knowledge should always be at our finger tips and 
what knowledge will suffice if it enables us to select truth from error 
when both are presented.’’ Such writers do not seem to see that the 
real problem is much wider in scope—that the knowledge of the 
way in which the individual is to be tested may determine his method 
of studying, and that some methods of studying may be more effective 
than others, no matter what the type of test is with which the individ- 
ual is actually tested. 

The purpose of the present study is to determine the influence of 
certain examination sets on immediate and delayed memory. The 
mental sets to be used during learning are what might be called the 
“recognition” and the ‘‘recall’’ examination sets. Two specific 
kinds of each type of set will be studied which will be called hereafter 
the true-false and multiple-choice, and the completion and essay types. 
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These sets are so named because they are the ways in which the sub- 
jects of the study are going to be told they are to be examined for 
immediate and delayed memory. In other words, the mental set 
during learning will be for some particular type of examination. Thus, 
the general problem is the determination of the influence of the 
expectancy of a certain type of test on immediate and delayed memory, 
a question which until the present time has not been investigated 
experimentally. The nearest approach to such a study was made by 
Jersild* who studied the effects of pre-examinations of various sorts on 
learning. He found that groups which had multiple-choice and essay 
questions in the pre-examinations made consistently higher scores than 
the group which had true-false questions. 

The specific questions which the present paper will attempt to 
answer are: First, is an individual’s immediate memory for a certain 
type of sense material affected by the fact that he believes he is to 
be tested in a certain manner for the material learned; second, is an 
individual’s delayed memory for the same material affected by his 


learning set; and, third, is there any difference between immediate 


and delayed memory dependent on the type of set at the time of learn- 
ing. In a subsequent paper the question of the relationship between 
the manner in which the individual studies and his examination set 
will be reported. 


EXPERIMENT 


The Subjects——One hundred twenty-four students served as sub- 
jects in this experiment. These students came from the first semester 
of the year course in elementary psychology and the course in abnormal 
psychology at the University of Michigan during the school year 
1932-1933. Four groups of individuals, called the true-false (T-F), 
completion (Comp.), multiple-choice (M-C), and essay groups were 
set up and equated as nearly as possible on the following bases: Honor 
point average, year in college, sex, the number of college credit hours 
completed in American history, and whether the subjects had or had 
not already served as subjects in that part of this investigation in 
which nonsense material had been used as learning material. 

The Learning Material—tThis consisted of two mimeographed 
chapters from a volume by Ropes* on campaigns during the Civil War. 
The maps present in the original material were omitted as the experi- 
menter wanted to see if his subjects, during learning, would draw 
maps of their own to aid them in the organization of the material. 
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This particular type of learning material was chosen for a number of 
reasons. In the first place it contains an abundance of factual 
material which can be readily tested. The fact that the material is 
of a factual nature made the construction of the various objective 
tests a fairly simple matter. It also made the scoring of the essay 
examinations for the facts contained in them rather easy. In the 
second place, though primarily factual, this material does lend itself to 
organization. This is an important item since one of the advantages 
often cited in favor of the essay test is that it tests the organization of 
learned materials whereas the objective tests do not. In the third 
place, students are unfamiliar with this type of material. This is an 
important factor since it helps insure an equivalence of background in 
the individuals of the various groups. 

The Tests.—T hese were of four types—essay, completion, true-false, 


and multiple-choice. The essay test consisted merely of two questions 
which were: 


I. Tell all you can about the Battle of Cedar Mountain. 
II. Tell all you can about the movements on the Rappahannock. 


The completion, true-false, and multiple-choice tests were made 
up of one hundred questions each. The hundred items were drawn 
about equally from each page of mimeographed material—six or seven 
items coming from each of the fifteen pages of mimeographed material. 
The questions covered identically the same material in the case of each 
type of test. The only difference between them lay in the fact that 
the items were cast in completion, true-false or multiple-choice 
form. In drafting the objective test items the author as far as pos- 
sible followed the rules which have been laid down by Ruch’ for the 
construction of the various types of test items. In order to insure that 
these rules were followed all items were reviewed by two other individ- 
uals who had the rulesin mind. The following is a typical item cast in 
completion, true-false and multiple-choice forms respectively: 


Pope could not retire behind Culpeper without sacrificing his communications 
with ‘ 

Pope could not retire behind Culpeper without sacrificing his communications 
with Jackson. 

Pope could not retire behind Culpeper without sacrificing his communications 
with (Sigel, Jackson, Greene, Marshall). 


Two other factors must be considered with reference to these tests— 
their validity and their reliability. The tests were validated by the 





“) ‘ 
i 


i 


= 








646 The Journal of Educational Psychology 


4 method of judgments. The author and a graduate student in psy- 
chology made lists of what they considered important items in the 
learning material. Those items which seemed most adequate to 
both individuals were then selected and cast in the various question 
forms. A final judgment was made on the questions by a graduate 
student majoring in American history. 
‘ ‘ The reliability or consistency of the tests was determined by 
correlating the scores on the tests given one day after the last learning 
period with the score on the same tests five weeks later. The coef- 
ficients of correlation found are given in Table I. Two coefficients 
were found for the multiple-choice test. The first was obtained by 
correlating the total number of right responses on the first test with the 
i total number of right responses on the second test (‘‘ Rights’ in the 
if table). In the second case the scores on both the first and second 
tests were corrected for chance using the formula S = R — 14W. 
The corrected scores were then correlated. 
; Two coefficients were also found for the true-false test. One 
i . was determined in which the scores correlated were uncorrected for 
Bi chance (‘‘Rights” in the table), and one in which the scores were 
we corrected for chance (R — W in the table). 
Mi; In the case of the essay tests three consistency coefficients were 
1 determined. The first was obtained by correlating the scores on the 
ais same points tested on the other examinations on the two tests (‘‘ Neces- 
sary Points” in the table). The second was obtained by correlating 
the scores on items other than those tested by the other examinations 
ie on the two tests. The third was obtained by correlating the ratings 
{ as to organization of the material on the two examinations. 
Re i: Only one coefficient was found for the completion test—the 
correlation of the scores on the first and-second givings of the test. 
ff Tas.E I.—CorRBLATIONS BETWEEN THE First AND SECOND TESTS FOR THE GROUP 
te oF OnE HuNDRED TWENTY-FOUR SUBJECTS 
RES. J iva Leh sh estas 4 soe tkek oh accion .64 + .04 
ut a a eels ain 4s w b'p wo oi Gi wmeeel .77 + .02 
at ass sched mice oe anes 6.8 sénuaree one .44 + .05 
le an SiS 54 + .05 
ay i sa bein hdd ese beveneseessons .66 + .03 
if a Wk sw ws pba aseen dee neds aise .56 + .04 


om iD. s cevenpesdosceennbeescens bene% .67 + .03 
ms oe Ras Lae lnk aes au cihae ORE od ao ORM .75 + .03 


st The Experimental Procedure.—All of the subjects when they arrived 
at the laboratory for the first time were put in one of the two experi- 
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mental sections. One section carried out the sense experiment and the 
other carried out the nonsense experiment. In the following discus- 
sion only the procedure for the sense experiment will be considered 
except for certain points made necessary by the fact that the individ- 
uals who went through the nonsense experiment first, later became 
subjects for the sense experiment. 


Each of the subjects was given a slip of paper on which a number 
and the following statement were written: ‘You are to study the 


material given you for a true-false (essay, completion, or multiple- 
choice) test.”’ 


Following this the mimeographed booklets containing the learning 
material were handed out. The subjects were then told: 


The experiment which you are going to carry out is one whose purpose is to 
try to discover whether it is better to study for a true-false, essay, completion, or 
multiple-choice test. You will have three two-hour periods to study the material 
which has been given to you in order to prepare yourselves for examinations on this 
material of the type written on the slip of paper given you when you entered the 
laboratory. These examinations will be given the day after and five weeks after 
your last learning period. You are to study this material in the way you would 
ordinarily study it for the type of examination assigned to you. On the black- 
board in front of you are examples of the various types of questions. You are to 
be tested with that type of question which corresponds to the kind designated on 
the paper given you. If you have any questions concerning this matter ask the 
experimenter about them at the end of this discussion before you start studying 
the material. 

The mimeographed booklet that I have given you is yours during the time 
you are in this laboratory. Put the number which you will find on the slip of paper 
on it so that the same booklet may be returned to you at the beginning of each 
study period. For your use I have put some note-book paper on the table in the 
front of the room. Before you leave after each study period be sure that you put 
your number on any notes that you have taken, and put the notes inside of the 
booklet so that you can use them during the next study period if you want to. 
Otherwise all I have to say to you is do not discuss this experiment with anyone 
else. 


The subjects who had already gone through the nonsense experi- 
ment and were therefore no longer naive when they came to this 
experiment, as they had been given similar directions in that experi- 
ment, were told: 


You will study the material which will be given to you for the same type of 
examination which you prepared for in the preceding experiment. Despite the 
fact that you were tested in that experiment with all types of examinations when 
you expected only one type, I expect you to study with just one type of examina- 














ons 


a to ep eS ne ate comma 

Benin ie co ee a ea DO RO ee oo 

eI RE ae et ee Pe ee Se ee 

c oe eS a ee ft si 7 a ae 4 
- 


Se acd sentinbindabcgiicier puosicnaeatitic sl Pints — arc “linet 
» Se Ee ee 


i 











648 The Journal of Educational Psychology 


tion in mind. You must remember that this is an experiment the results of which 
are of value only if you follow the directions given to you. 


With the exception of the first paragraph of directions given to the 
first group the remainder of the directions were the same for this group. 

The three two-hour study periods were held in the late afternoon 
and the evenings—in the afternoons from 3:30 to 5:30 o’clock and in 
the evenings from 7:30 to 9:30 o’clock. The first and second study 
periods were separated by two days, the second and third by five days. 
All of the study periods were held in the same room under fairly con- 
stant conditions. The study periods were supervised by either the 
experimenter or his assistant in order to keep disturbances at a min- 
imum and to keep the subjects at the task in hand. At the close of 
each study period the numbered booklets with the notes which they 
contained were collected from the students as they left the room. This 
was done to prevent the subjecis from studying the material outside 
of the study periods. The name of the author of the material had been 
omitted from the booklet in order to prevent the subjects from looking 
up the material in alibrary. As an added precaution the experimenter 
kept the only copy of the book which he could find in his own office. 

At a group’s regular time (afternoon or evening) on the day follow- 
ing the last learning period, it met in the same laboratory room to take 
the tests. In the case of any given group of subjects (true-false, 
multiple-choice, completion or essay) the tests were given in one of 
four orders: True-false, multiple-choice, completion, essay; multiple- 
choice, completion, essay, true-false; completion, essay, true-false, 
multiple-choice; or essay, true-faise, multiple-choice, completion. This 
was done in order that the practice effect resulting from the first tests 
in the series would be equalized on the remainder of the tests. Each 
order of tests appeared approximately an equal number of times for 
each of the four groups of subjects, as it had been determined in 
advance by the experimenter. The subjects were allowed one hour 
for the essay test. No time limits were stated for the three objective 
tests. Most of the subjects, however, finished all three of them in less 
than an hour. 

The same testing procedure was followed when the students took 
the same tests after a five week interval. At that time each student 
took the tests in the same order as that in which he had first taken 
them. 

Scoring of the Tests——Scoring keys were made out by the experi- 
menter for each of the objective types of examination. These were 
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given to assistants each of whom corrected all of the examinations of 
one kind. A second group of assistants went through the tests in 
order to check the scoring. The scorers of the completion tests were 
two individuals who were thoroughly familiar with the material. 
This, of course, was necessary because they had to give credit for any 
response that was correct even though it was not given in the same 
words as appeared in the scoring key. In cases of doubt the scorer 
placed a question mark on the top of the test paper. The question- 
able answers were then read very carefully by the experimenter and 
by both scorers and when two of the three decided that the answer 
should be given credit it was counted as correct. 

The essay tests were scored in three different ways—one fairly 
objective and two subjective ways. The objective type of scoring 
was on what is called ‘‘necessary points.”” By scoring for necessary 
points is meant scoring the examinations for the same 100 points 
tested in the objective examinations. The two subjective methods of 
scoring were on what are called “other items” and “organization.” 
By scoring for other items is meant scoring the examination for every 
fact other than those which came under necessary points. The 
organization score was a rating of the write-up of the questions. 

Six scorers graded all of the answers to the essay questions on 
these three bases. All of them studied the learning material for the 
same amount of time as the experimental groups in order to become 
thoroughly familiar with it. They were given a list of the one hundred 
points tested on the objective examinations and told that they were 
to give each of those points included in the answer one point provided 
that it was stated absolutely correctly. In the scoring of other items 
the scorers were told that they were to give every point which they 
considered of importance not included under necessary points a score 
of one if it was stated absolutely correctly. The organization of the 
answers was rated on the following basis. The raters were told to give 
a paper on which facts were stated in haphazard fashion with no organ- 
ization whatsoever a zero rating. A paper where the organization was 
as good as that of the learning material was to be rated a ten; and a 
paper whose organization was as good as that which the scorers thought 
they could have written was to be rated a five. Otherwise the rating 
was left to the judgment of the raters. 

The results on the essay tests in the following section are all based 
on the averages of the scorers. It is to be noted that the three types of 
scoring on each essay examination were made absolutely independently 
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by each scorer. No marks were made on the examination papers 
themselves by the graders—the points being tallied on a separate 
sheet of paper. That the average total scores which are used here- 
after are reliable was determined by correlating the average scores 
of three of the scorers with the average scores of the other three scorers 
on the first and on the second tests for the three types of scores. These 
coefficients were found to range in size from .82 + .03 to .95 + .01. 

Results ——The experimental data on the true-false tests are given 
in Table II. This table gives the number of cases in the distribution, 
the means of the distributions, the standard deviations of the distribu- 
tions, and the standard deviations of the means for the true-false, 
multiple-choice, completion, and essay groups on the true-false tests 
both when uncorrected (U) and corrected (C) for chance. 


Tas_e I].—ExpEerRIMENTAL DaTA FOR THE VARIOUS GROUPS ON THE TRUE-FALSE 


























TEstTs 
First test. Second test 
Group N 

Mean SD SDy | Mean SD SDau 
CCK cchese hate bat |, 31 | 68.98 | 8.47| 1.52 | 50.52 | 13.91 |) 2.50 
ES ss ssa obese oon 31 | 42.19 | 14.81 | 2.66 | 19.94 | 16.07] 2.89 
Dg aie ocala toca 31 | 70.08 | 7.99 | 1.43 | 51.48 | 12.52 | 2.25 
ee ae? 31 | 42.68 | 15.08 | 2.71 | 20.03 | 13.87 | 2.49 
Completion (U)......... 30 | 69.23 | 8.24; 1.50 | 60.70 | 12.24; 2.23 
Completion (C)......... 30 | 41.70 | 14.68 | 2.68 | 30.20 | 14.76 | 2.69 
PT Da cnnccossccens 32 | 68.31 | 6.36; 1.12 | 61.13 | 7.71 1.36 
OF icccusveuceds .| 32 | 41.63 | 10.52 | 1.86 | 30.47 | 10.90; 1.93 











From these data critical ratios (a critical ratio is the ratio of the 
actual difference between two measures to the standard deviation 
of that difference*) were computed in order to determine whether a 
true difference exists between the means of the various groups. In 
the present case the difference between the means of two groups was 
found on a given test (first or second) scored in a given manner (uncor- 
rected or corrected for chance). The critical ratio was then deter- 
mined by dividing the difference found by the standard deviation 
of that difference. Table III gives the critical ratios for the true- 
false tests. Table III and all subsequent tables of the same nature 








*SD difference = +~/(SD,;)? + (SD:)? where SD; is the standard deviation of 
the first measure and SD; the standard deviation of the second measure. 
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are to be read as follows. The items in the columns are compared with 
the items in the rows so that the critical ratio, —.53 in the first column 
and third row means that the difference in means between the true- 
false and multiple-choice examination-set groups on the first true-false 
test when the scores were uncorrected for chance is against the true- 
false group, 7.e., the mean of the true-false group is lower than that of 
the multiple-choice group (the signs used in front of the ratios are 
merely aids in reading the table). 

A critical ratio of three indicates for the purpose of this paper 
the complete reliability of a difference. Actually, the chances are 
nine thousand nine hundred eighty-six in ten thousand that a true 
difference exists between the two measures compared and is in the 
direction found. The following conclusions seem warranted from the 
data in Table III. 


TaBLeE II].—ComparRInNG THE VARIOUS GROUPS ON THE TRUE-FALSE TESTS IN 
Terms OF THEIR CRITICAL Ratios 




















First test Second test 
Group 
TF | mc | O™)] vr | mc | Com 
pletion pletion 
a hic su 9 wkdles « <acemerad 
ds cc Sweuse ce eee te 
coos cb casa ches sien —-.§3 | ... race — .29 
SILC fa kihine dieie vowel bee —.13] ... oe — .02 
Completion (U)............. — .12 41 is —3.04| —2.91 
Completion (C).............. 13 | .26 pais —2.60| —2.77 
a chs vp 0b es 0 t0RbR ae .385 | .98 .49 | —3.72| —3.68) —.16 
PE a icis cuter ance bees 17 | .32 .02 | —3.03) —3.15| —.08 














1. The four examination-set groups did about equally well on the 
true-false test which was given on the day following the last learning 
period. No differences were found among any of the four groups which 
were reliable or which even approached reliability whether the scores 
on the tests were uncorrected or corrected for chance. 

2. The essay attitude group was superior to both the true-false 
and multiple-choice attitude groups on the true-false test which 
was given five weeks after the last learning period. The differences 
are statistically reliable whether the scores are uncorrected or corrected 
for chance. 
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3. The completion attitude group was superior to both the true- 
false and multiple-choice attitude groups on the delayed true-false test. 
These differences approached significant reliability especially when 
the scores on the tests were uncorrected for chance. 

4. There was no difference whatsoever between the true-false 
attitude group and the multiple-choice attitude group on the second 
true-false test whether the scores were uncorrected or corrected for 
chance. , 

5. There was no difference between the completion attitude group 
and the essay attitude group on the second true-false test whether the 
scores were uncorrected or corrected for chance. 

The experimental data on the multiple-choice tests are given in 
Table IV. The critical ratios computed from these data are to be 
found in Table V. Table V shows that: 

1. The four different attitude groups did about equally well on the 
multiple-choice test which was given on the day following the last 
learning period. No reliable differences or differences which even 
approach reliability were found whether the scores on the tests were 
uncorrected or corrected for chance. 

2. The essay attitude group was superior to the multiple-choice, 
true-false and completion attitude groups on the multiple-choice 
test which was given five weeks after the last learning period. The 
differences between the essay group and the other three groups are, 
however, not completely reliable. The differences are most reliable 
where the essay group is compared with the multiple-choice and true- 
false groups. This is particularly true where the scores on the 


TaBLe I1V.—ExXpERIMENTAL DATA FOR THE VARIOUS GROUPS ON THE MULTIPLE- 
CHOICE TESTS 
































First test : Second test 
Group N 

Mean; SD SDu | Mean| SD SDs 
ey ore 31 | 62.23 | 11.30 | 2.03 | 54.03 | 12.87 | 2.31 
TL he: 2 Weg: bpm elaeran 31 | 52.00 | 14.46 | 2.60 | 41.45 | 15.03 | 2.70 
le sk aes ook ad 31 | 65.06 | 10.63 | 1.91 | 52.97 | 10.43 1.87 
hs a ek b'0:6 8 gid 31 | 54.32 | 13.41 | 2.41 | 41.16 11.46 | 2.06 
Completion (U)......... 30 | 64.20 | 10.68 | 1.95 | 56.60 | 10.30 | 1.88 
Completion (C)......... 30 | 52.80 | 14.10 | 2.57 | 44.60 | 12.57 | 2.30 
| 32 | 63.34 | 7.14| 1.26 | 56.97 | 6.82} 1.20 
| re 32 | 52.94 | 9.03 | 1.60 | 48.34| 8.10/ 1.43 








Old and New Types of Examination 


653 


TaBLE V.—ComMPaARING THE Various Groups ON THE MULTIPLE-cHOICE TxsTs 
IN Terms OF THEIR CritTicaL Ratios 











First test Secone test 
Group 
mc | Tr | ©™-| wc] rr | Com 

pletion pletion 
SPR 05 vc teawkeerbbc 
Ps ho ss ck calcd ae ocdks 
Weck b oka ss die dlsiedis RTs dss Biseude 1.06 
co rr | ES Pen .06 
Completion (U)............. =. | OB | ..ss- — .86| —1.37 
Completion (C).............. — .22 fae éeecan —.89} —1.11 
MOEN i'r 8 6s i 00 ov awed’ — .05 .75 .87 | —1.13}) —1.80| —.37 
a ee — .3l .48 — .05 | —2.26| —2.86| —1.38 























multiple-choice test have been corrected for chance. Since the con- 
sistency coefficient of the multiple-choice test was found to be highest 
when the scores were corrected for chance the critical ratios based on 
the corrected scores should be given most weight. Hence the differ- 
ences found between the essay group and the multiple-choice and true- 
false groups may be taken to indicate a true difference. The difference 
between the essay attitude group and the completion attitude group 
although larger in the case where the scores have been corrected for 
chance does not approach statistical significance. 

3. The completion attitude group is superior to the multiple-choice 
and true-false groups on the second multiple-choice test. However, 
the differences are not statistically significant whether the scores are 
uncorrected or corrected for chance. 

4. There seems to be no difference between the multiple-choice 
attitude group and the true-false attitude group particularly where the 
scores have been corrected for chance. 

The experimental data on the completion tests are given in Table 
VI. The critical ratios computed from these data are shown in Table 
VII. 

The results indicated in Table VII are as follows: | 

1. The completion attitude group is superior to all three of the other 
attitude groups both on the first and the second test. The difference 
between the completion attitude group and the true-false attitude 
group approaches the criterion for reliability and is reliable in the case 
of the second test. The difference between the completion group and 
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TaBLE VI.—EXPERIMENTAL DaTA FOR THE VARIOUS GROUPS ON THE COMPLETION 

















TEstTs 
First test Second test 
Group N 
Mean SD SDxu | Mean SD SDy 
Completion............. 30 | 58.10 | 13.59 | 2.48 | 49.00 | 13.11 | 2.39 
ek ow ui tga wee 31 | 50.84 | 12.45 2.24 | 39.13 | 12.63 2.27 
a i ia at 31 | 49.77 | 12.69 | 2.28 | 42.13 | 14.55 | 2.63 
cai d's we tad neal 32 | 52.37 | 12.15 | 2.15 | 44.41 | 10.41 1.84 




















TasLE VII.—ComparRING THE VARIOUS GROUPS ON THE COMPLETION TESTS IN 
TerMs or THEIR CRITICAL RATIOS 











First test 7 Second test 
Group 
Com- | or | mc | O™!] pr | Mc 
pletion pletion 
SE sa on bins 640 tacns 
AS ee areata SS, Ge -| 2.99 
ee I ae a 2.56 "ft Soren 1.94| —.86 
ie eh niawiane a 1.84 | —.50} —.83/ 1.52] —1.81] —.71 























the multiple-choice group, on the other hand, more nearly approaches 
complete reliability on the first test. The same is true of the difference 
between the completion attitude group and the essay attitude group 
although here the differences are still less reliable. These differences 
are all large enough, however, to indicate that a true difference may be 
present. 

2. The essay attitude group is superior to the true-false and 
multiple-choice attitude groups on both the first and second tests. 
However, the differences are not reliable except that there is possibly 
an indication of a true difference between the essay group and the true- 
false group on the second test. 

3. There is no evidence of any reliable difference between the true- 
false and multiple-choice groups on either the first or second tests. 

Table VIII gives the experimental data on the essay tests for 
necessary points (N), other items (J), and organization (O). The 
critical ratios computed from these data are to be found in Table IX. 
With regard to necessary points Table IX shows that: 
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1. The essay attitude group is superior to the three other groups 
on the first and second essay tests. In the case of the first test the 
differences between the essay group and each of the other groups is 
reliable. In the case of the second test only the difference between 
the essay group and the true-false group is reliable. However, the 
other two differences approach statistical significance, the chances 
being better than ninety-nine in one hundred that there is a true 
difference present in both cases. 

2. No significant differences are present either between the com- 
pletion group and the true-false group or between the completion 
group and the multiple-choice group on the first and second tests. 

3. No significant differences are present between the true-false and 
multiple-choice groups on either the first or second tests. There is an 
indication, however, of a difference in favor of the multiple-choice 
group on the second test. 

With regard to other items Table IX shows that: 

1. The essay attitude group is superior to the other three groups on 
the first and second essay tests. However, in neither the first nor the 
second test are these differences statistically reliable. In the case of 
the first test the differences tend to approach statistical significance. 
In the case of the second test only the difference between the essay 
group and the true-false group approaches statistical significance. 

2. No significant differences are present between the completion 
and either the true-false or multiple-choice attitude groups on the 
first and second tests. 

3. No significant differences are present between the true-false and 
multiple-choice attitude groups. However, just as in the case where 
these two groups were compared on necessary points there is an indica- 
tion of a difference in favor of the multiple-choice group on the second 
test. 
With regard to organization the following results are indicated by 
Table IX: 

1. The essay attitude group is superior to the other three attitude 
groups on the first and second essay tests. The differences are not 
statistically completely reliable for either the first or second test. 
They do indicate, however, that a true difference may be present. 

2. No significant differences are present between the completion 
group and the true-false group or between the completion group 
and the multiple-choice group on the first or second tests. 
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Taste VIII.—Exprermentat Data For THE Various Groups ON THE ESSAY 
TEstTs 
First test Second test 
Group N 

Mean SD SDmu | Mean SD SDau 
ee Pe istic se wee 32 | 17.26 | 4.50 .80 | 10.73 | 3.78 .67 
vine pean 32 | 6.26 | 2.38 .42| 2.95 | 1.57 .28 
ee ape 32 | 5.83 | 1.30 .23 | 4.26] 1.55 .27 
Completion (N)......... 30 | 12.78 | 5.64} 1.03 | 7.82) 5.07 .93 
Completion (J)......... 30 | 5.22 | 2.28 .42 | 2.63 1.62 .30 
Completion (O)......... 30 | 4.98 1.69 31 3.52 1.72 31 
_ Cer reer 31 | 12.74| 6.44) 1.16) 6.87 | 3.89 .70 
RG tis cebu oe 31 | 4.97 | 2.31 .41 | 2.33 .92 BY 
APES REN ee i 6.8 1.81 388 | 3.31 1.47 .26 
OS ere 31 | 13.10 5.31 .95 8.19 4.46 .80 
ESE See 31 | 5.51] 2.44 .44 | 2.73 | 1.67 .30 
ha i dcwau sis aed 31 | 5.25 | 1.67 .30 | 3.80) 1.87 .34 


























Tasie [X.—ComMPARING THE VARIOUS GROUPS ON THE Essay TrEstTs IN TERMS 
or TuHerrR CRITICAL Ratios 



































First test Second test 
Group C ‘i 
om- - 
Hanay pletion ie | sey pletion Lad 
| iad inal 
OS | | 
SS a+ ding sweets soos 
et Ra eee repre. 
Completion (N)............. gg AP, BER 2.53 
Completion (J).............. ee ECR ew kce .78 
Completion (O).............. Sree eee 1.80 
cre a in cima inet tes 3.21 ' 3 re 3.98 .82 
has od ho nw ng ach a 2.19 | Ae 1.88 .88 
EC h« thn sie anecen Wawa 1.82 | —.31]| ..... 2.50 .53 
Ts L'a bg Gal oes a wae 3.35 | —.23 | —.24|] 2.44) —.30| —1.25 
SE. So.bw ts daveb tiadens 1.23 | —.48 | —.90 .64 | —.24) —1.18 
Eis tna diwees wwdde 1.53 | —.63 | —.30| 1.07) —.61 | —1.14 


3. No significant differences are present between the true-false and 


multiple-choice groups. 


However, here again there is an indication of 


a difference in favor of the multiple-choice group on the second test. 
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DISCUSSION 


These results make it obvious that the “recall examination set” 
groups were superior to the recognition set groups with the following 
exceptions. Both recall groups were not superior to the recognition 
groups on the immediate true-false and multiple-choice tests. Also 
the completion group showed no superiority over the true-false and 
multiple-choice groups on the essay tests. 

The fact that the recall groups were not superior to the recognition 
groups on the immediate true-false and multiple-choice tests but were 
superior to those groups on the delayed true-false and multiple-choice 
tests is interesting. The true-false and multiple-choice tests are, of 
course, tests of recognition. The fact that there were no signif- 
icant differences between the recall and recognition groups on the first 
recognition tests, but that there were on the second test would seem to 
indicate that the rate of forgetting and hence the forgetting curve, as 
far as the process of recognition is concerned, is influenced by the 
examination set. In other words, forgetting for recognition is more 
rapid in the case of the recognition groups than for the recall groups. 
On the other hand, the results in the present investigation seem to 
indicate that the forgetting curve, as far as recall is concerned, is not 
influenced particularly by the examination set since the superiority 
of the recall set groups on the recall tests is of no greater and in most 
cases of not so great significance on the delayed tests. 

These indicated differences in the forgetting curve for recall and 
recognition between the recall and recognition examination set groups 
may possibly be explained quite simply. It would seem logical to 
suppose that the methods of study used by the individuals in the recog- 
nition groups prepare them chiefly for recognition whereas the methods 
used by the individuals in the recall groups prepare them for more than 
recognition. Hence with the lapse of a time interval after learning 
the materials which were only up to the recognition threshold in the 
case of the recognition groups have now fallen below that threshold; 
whereas materials which had been up to the recall threshold in the 
case of the recall groups, although they may now be below the recall 
threshold, are still above the recognition threshold. On the other 
hand, following the same line of reasoning, it is not to be expected 
that any more significant differences between the recall and recognition 
groups would be found on the delayed than on the immediate tests 
of recall because the materials which are up to the limen of recall in 
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the recognition groups on the first test should not necessarily be for- 
gotten any more rapidly than the materials which are up to the limen 
of recall in the individuals in the recall groups. 

To explain the differences among the recall and recognition groups 
on the various tests, the results should be considered in the light of 
the methods of study which were used by the various groups. These 
methods will be discussed in a subsequent article. 

However, no matter what determined these differences among the 
various ‘‘examination set” groups on the immediate and delayed tests 
the important fact to be considered is that there were differences. If 
similar differences could be found using other kinds of sense material 
the practical implications would be obvious. There is no reason to 
believe that such or even greater differences would not be found if 
other types of sense material were used especially in view of the fact 
that the learning material used in the present experiment was of a kind 
very well adapted to objective testing since it was largely factual in 
nature. In other words, it seems logical to suppose that even greater 
differences between the various examination set groups would have 
been found if the learning material had been of a less factual nature and 
more of a general nature so that organization would play a larger part. 

From the practical side the results of the sense experiment indicate 
that for delayed recognition and for immediate or delayed recall it is 
best to study for a recall type of test. In other words, when equal 
amounts of time are spent in preparing for an examination it is much 
more economical to spend one’s time preparing for a recall type of 
examination because that type of preparation gives one a more com- 
plete mastery of the material as measured either in terms of recognition 
or of recall. If this is true it means that the constant use of recogni- 
tion types of objective examinations in the school situation is a very 
wasteful practice as the students will under such conditions always 
have a recognition examination set. The question of the relative 
value of recall and recognition tests then is not a matter merely of 
knowing ‘“‘what knowledge should always be at our finger-tips and 
what knowledge will suffice if it enables us to select truth from error 
when presented.”” It is much more than that—it is a matter of eco- 
nomical learning. 

It would seem that the only conditions under which recognition 
types of questions should be used are: (1) When recognition types of 
questions form only a part of an examination which contains recall 
questions so that the student must use the recall as well as the recogni- 
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tion sets in studying; and (2) when the type of examination to be given 
is unknown so that the student cannot study with a recognition set. 
This latter condition is most flagrantly violated in the school situation. 
Many teachers always give recognition tests so that the students come 
to expect them and therefore always study with a recognition set. 
The students in a class where testing is carried out under such condi- 
tions no doubt are penalized as to what they learn compared with 
students in another class who do not know how they are to be tested. 
It seems inconceivable that any one could argue for constant recogni- 
tion tests on the basis that in an entire course there is no material 
worthy of being learned well enough to be “‘at one’s finger tips.”’ 

According to the results of the present investigation, an occasional 
true-false or multiple-choice test may be given when the student knows 
he is to be tested with such a test if the teacher feels that the material 
being tested is not of sufficient importance to warrant its being learned 
for more than immediate recognition. The important thing, however, 
is that these tests should not be given indiscriminately. The teacher 
must decide what material should be learned so that it can be recalled 
and recognized for as long a time as possible and what material should 
be learned for immediate recognition only. 

The difference between the two recall types of examination sets 
must also be considered from the practical standpoint. If all that 
is desired in the learning of materials is the simple recall of isolated 
facts when the essential cues related to a certain fact are given, then, 
from the results found in the present investigation, the completion 
examination set would seem to suffice. However, should it be desired 
that the learned material be recalled with a certain organization and 
without specific cues for every detail being supplied by the examiner, 
then it would seem that the essay type of examination set is most 
adequate. There certainly must be some parts of every course where 
the organization of the recalled material is as important as the recall 
of isolated facts and where the material should be well enough learned 
so that the students do not need cues in order to recall it. Where such 
is the case the student’s studying should be with an essay type examina- 
tion set. Here again it is the teacher who must decide. 

These same points must also be taken into consideration in compar- 
ing the essay with the true-false and multiple-choice examination sets. 
Not only do the latter two types of examination sets result in poorer 
recall of the material being learned than the essay examination set 
but they also discourage organization of the same material. 
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SUMMARY 





With respect_to the learning of and memory for sense material 
used under the@™nditions described the following conclusions seem 
warranted. 

1. The examination set which the individual has is of funda- 
mental importance both in the learning of and memory for sense 
material as is indicated by the following results. 

(a) For recognition tests of immediate memory, it makes no differ- 
ence whether recall or recognition examination sets are used in learning. 

(b) For recognition tests of delayed memory or for recall tests of 
immediate or delayed memory, recall examination sets are superior 
to recognition examination sets in learning. 

(c) For the mere recall of isolated facts where specific cues are 
given, the completion type of recall examination set is superior to any 
other type of examination set for either immediate or delayed memory. 

(d) For the recall of facts where specific cues are not given and for 
the recall of facts in an organized fashion, the essay type recall examina- 
tion set is superior to any other type of examination set. 

(e) For the recognition of whether a statement is true or false, for 
the recognition of the correct answer among a group of possible 
answers, or for recall of either type it seems to make little difference 
as to what type of recognition examination set is used. 

2. The examination set also seems to be of fundamental impor- 
tance in determining the rate of forgetting and thus the curve of for- 
getting as is indicated by the following results. 

(a) The rate of forgetting in the case of recognition is more rapid 
for learning with recognition examination sets than for learning with 
recall examination sets. 

(b) The rate of forgetting in the case of recall is about the same for 
learning with recognition examination sets as it is for learning with 
recall sets. 

3. If it can be assumed that other types of learning materials would 
give similar results the following practical implications of the results 
of this experiment seem indicated. 

(a) Since it is more economical, when a given amount of time is 
spent in studying, to use a recall examination set for delayed recogni- 
tion or immediate and delayed recall tests, recognition questions 
should be used in testing only when they form a part of the entire 
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examination or when students are unaware that such questions are to 
be used exclusively. 

(b) If the teacher feels it necessary that the students be able to 
recognize certain materials for a short time only, then the indications 
are that a recognition examination set may be used. This means that 
the teacher must evaluate the material in his course very carefully 
since recognition tests, if given indiscriminately, may have a deleterious 
effect on what the students ultimately retain of the course. 

(c) If the teacher feels it necessary that the students be able to 
recall isolated facts when specific cues are given as to the fact wanted, a 
completion examination set may be used with profit. 

(d) If the teacher wants the students to recall the material in an 
organized fashion and to know facts when cues are not given, the essay 
examination set should be used in preference to any objective type of 
examination set. Here again the teacher must evaluate the material 
which he presents in the light of what the student should learn from 
the course. 
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A CRITICAL ANALYSIS OF THE PETERSON- 
THURSTONE WAR ATTITUDE SCALE 


L. W. MILLER 
University of Denver 


While scoring the Peterson War Attitude Scale’ Form A, the writer 
noted that certain items were checked which were rather far removed 
in scale value from the mean scores of the subjects tested. This 
suggested the desirability of a detailed item analysis bases upon the 
responses of those taking the test. In constructing this scale, 
the method described by Thurstone? was followed by Peterson. The 
scale value of an item is determined by the average position given 
by a large number of judges. The items are sorted into eleven piles. 
Those at the low end represent a pacifistic, and those at the upper end 
of the scale, a militaristic attitude. In selecting items for the final 
scale, only those are retained which show low-variability of placement 
by the judges. If the judges agree closely as to the scale value of a 
given item, it is retained, provided the item passes certain other 
tests which are of no concern in this investigation. 

The question raised here is whether or not the scores of college 
students who took the test agree with the scale values assigned by the 
judges. That is, if an item has an assigned scale value of 8.00, it 


‘would seem reasonable to suppose that the mean scores of those check- 


ing it should be above the average in militarism and that certainly 
few, if any, who are extremely pacifistic should check it. 

The subjects are two hundred ninety Freshmen and Sophomore col- 
lege students. The distribution of the scores is given in Table I 
below. The score of an individual is the mean of the scale values of all 
items checked. The twenty items constituting the scale are appended. 

The distribution of scores indicates that most of these college 
students score below 5.5 which is presumably half-way between extreme 
pacifism and extreme militarism. A group containing more individuals 
at the militaristic end of the scale would have been more satisfactory 
for this study. The four most militaristic items (Table II) as would 
be expected considering the nature of the group, were checked by very 
few subjects. Accordingly, they are, for the most part, omitted 
from consideration below. 

An examination of Table II shows a tremendous range in the 
number checking each item. (A check indicates that the subject 
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Taste I.—Scorges on Pererson-THURSTONE War ATTITUDE ScaLe A 











\N = 290) 

Scores F 
8.0 8.39 1 AM = 4.03 
7.6 7.99 0 Median = 4.06 
7.2 7.59 1 Q: = 3.35 
6.8 7.19 1 Q; = 4.64 
6.4 6.79 1 Q= .65 
6.0 6.39 4 SD= .98 
5.6 5.99 5 
5.2 5.59 16 
4.8 5.19 25 
4.4 4.79 45 
4.0 4.39 54 
3.6 3.99 39 
3.2 3.59 40 
2.8 3.19 25 
2.4 2.79 21 
2.0 2.39 11 
1.6 1.99 1 

N = 290 











agrees with the statement.) For example, one hundred sixty-seven 
check item number one, with a scale value of 7.5, while but forty- 
one check item ten, which is close to item one in scale value. Item 
five, with a scale value of 6.9 is checked by two hundred thirty-four, 
whereas item thirteen with a scale value of 6.8 is checked by one hun- 
dred sixty-seven students. Similar contrasts may be noted in com- 
paring items four and seven; eleven and eight; sixteen and three. 
Some items are apparently checked by the pacifistic as well as the 
more militaristic numbers of this group. Items two, seven, fourteen 
and nineteen are each checked by over ninety per cent of the group. 
Items two, five, seven, nine, eleven, twelve, fourteen, seventeen and 
nineteen are each checked by seventy per cent or more of the subjects. 
These constitute nearly half of the scale items. These items vary in 
scale value from .8 to 6.9 or from very pacifistic to fairly militaristic. 
This would seem to indicate that these items do not discriminate 
highly. This is further substantiated by the range of mean scores, 
item two being checked by those ranging from 2.16 to 7.10 in mean 
score. The subjects checking item five, range in mean scores from 
2.57 to 8.16, and items seven and seventeen from 1.97 to 6.56. Items 
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TasLe II.—ScaLe Vaturs; Nomsper Cueckine Eacu Item; Mean anp MEDIAN 
or Scores; QuaRTILES; DeviaTIons or ScoRES AND RANGE OF ScorEs FOR 
Tuose Cueckina Eacu ITEM 






































Item | Scale | Number | Mean | Median Standard | Range 
number | value | checking | _ score score Q. |} | Q deviation | of scores 
15 11.0 7 5.89 5.85 3.81 
18 10.1 6 5.73 5.10 oT oy File 3.75 
3 9.7 26 5.13 4.96 |4.35)5.55).60| 1.14 4.78 
20 9.2 10 6.03 ES A a a oe ee 3.08 
6 8.7 108 4.81 4.68 |4.30\5.10).40 74 4.46 
10 8.3 41 5.04 4.89 (|4.37/5.57|.60) 1.05 5.13 
1 7.5 167 4.65 4.45 |4.10)/4.93).42 .62 4.44 
5 6.9 234 4.26 4.25 |3.69\4.75).53 .84 5.59 
13 6.8 163 4.54 4.46 |4.03/4.94) .46 .80 5.01 
16 6.5 191 4.20 4.19 |3.63/4.65).51 .82 5.75 
8 5.5 24 4.22 4.14 |3.454.70).63) 1.13 5.63 
ll 4.7 210 4.15 4.22 |3.51/4.66).58 .84 4.76 
14 3.7 279 3.98 4.03 |3.34/4.56).61 .92 5.23 
2 3.5 265 3.90 3.76 |3.32/4.52).60 .88 4.84 
19 3.2 264 3.92 3.99 |3.31/4.39).59 .89 5.13 
17 2.4 230 3.83 3.89 |3.19/4.41).61 .87 4.59 
12 2.1 250 3.83 3.92 (|3.254.43).59 81 3.77 
9 1.4 205 3.69 3.72 |3.12)4.26).57 .74 3.36 
7 .8 263 3.90 3.96 5. 28/4.49).61 .86 4.59 
4 2 121 3.29 3.23 2.753.70) .48 my 3.23 





five, nine, eleven, twelve, fourteen, sixteen and nineteen are very 
similar to the above in this respect. In constructing such a scale, 
other things being equal, items should be selected which show a rela- 
tively small range in the mean scores of those checking it, if they are 
to be highly discriminating. A consideration of the quartile and 
standard deviations of scores of those checking each item, likewise 
reveals marked differences in the capacity of the items to discriminate. 
Ideally, a very pacifistic item should be checked only by those whose 
scores on the entire scale place them at the pacifistic end of the scale, 
certainly few if any whoare above average in militarism should check it. 
This would give a small range and small quartile or standard deviations. 

If a short scale such as this one is to discriminate highly between 
levels of attitude it would seem essential that only a small proportion 
of the items should be checked and that these should show a small 
range in scale values. The wide range in scale values has already 
been noted. Table III shows that the average number of items checked 
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Tas_e II].—NumsBer or Items Cxeckep AND Numper or Sussects CHEcKING 
(Total Number Items = 20; Number Subjects = 290) 


NuMBER OF oe CHECKED Number or Sussects Casckina 

0 

16 2 
15 2 
14 11 
13 21 
12 49 
11 65 
10 64 
9 40 
8 22 
7 8 
6 5 
5 1 
4 0 

N = 290 


ITEMS CONSTITUTING SCALE WITH SCALE VALUES AT LEFT 
EB VALUE 


1. Under some conditions, war is necessary to maintain justice. 

. The benefits of war rarely pay for its losses even for the victor. 
War brings out the best qualities in men. 

There is no conceivable justification for war. 

War has some benefits; but it’s a big price to pay for them. 
War is often the only means of preserving national honor. 

War is a ghastly mess. 

I never think about war and it doesn’t interest me. 

War is a futile struggle resulting in self-destruction. 

The desirable results of war have not received the attention they 
deserve. 

11. Pacifists have the right attitude, but some pacifists go too far. 
12. The evils of war are greater than any possible benefits. 

13. Although war is terrible it has some value. 

14. International disputes should be settled without war. 

15. War is glorious. 

16. Defensive war is justified but other wars are not. 

17. War breeds disrespect for human life. 

18. There can be no progress without war. 

19. It is good judgment to sacrifice certain rights in order to prevent war. 
20. War is the only way to right tremendous wrongs. 
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was eleven. There are twenty in the scale. The lowest quartile 
point is 9.91 and the highest 12.26. The smallest number of items 
checked is five and this by but one subject. Only 4.1 per cent of the 
entire group check fewer than eight items. Since so large a portion 
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} (seventy-four per cent) check half or more than half of the items of v 
| the scale, it is inevitable that the range of the mean scores of those V 
a checking each item should be narrow as shown in Table II, column four. e 
g " The tendency to check a large proportion of the items is not confined t 
A to those with scores near the middle of the distribution of scores. The ii 
i mean number checked by those in the highest ten per cent of the group n 
‘ i is 10.2, and by the lowest ten per cent is 8.4. i 
Ai By subtracting the lowest from the highest scale value checked by 1 
; : each student, the ranges in the scale values were determined. These 
id: 


varied from 10.2 to 3.3. Those who checked items differing by 10.2 


Pala wea ee 
SE Le ae + ee 
> oa oe - her 


if scale units in value agreed with statements varying from the most 
if militaristic to the next to lowest in pacificism. The range of the entire 
it scale is from .20 to 11.00 or 10.8 units. The mean range of the scale e 
values of the items checked is 7.2 which is two thirds of the range of s 
the entire scale. Stated briefly, it means that the average student in 
this group agreed with statements which have widely divergent I 
assigned scale values. If a scale value of as low as 1.4 is checked, i 
the highest scale value checked by the average student would probably t 
be 8.6. The mean range in scale values checked for the highest ten 
per cent (militaristic) is 7.7 and for the upper quartile is 7.7. For t 





the upper fifty per cent, it is 7.5. For the lowest ten per cent the 
mean range in scale values checked is 6.4; for the lowest quartile, it is " 
6.5 and for the lower fifty per cent isis6.7. Thus, there appears to be a U 
tendency for the more pacifistic members to check items varying over a s 
somewhat smaller range than that found for the more militaristic ¢ 
members of this group. At both ends of the distribution, as well as at 


all other points in it, the range is wide. It is sixty per cent of the range S 
of the scale for the lowest ten per cent as compared with seventy-one 7 
per cent for the highest ten per cent and sixty-six per cent as the mean ; 
for the entire distribution. In future scale construction, items should 
be so selected (if possible) that a subject reacting to the scale will ¢ 
check (accept) only those statements within a narrow range of scale . 
values. 

In going from the most pacifistic to the most militaristic item, the I 
mean scores show a definite tendency to increase with an increase in 1 


scale values. The rank-order correlation between scale values and 
mean scores is .97. This indicates that, considered either by scale 
values or by mean scores, the items form a graded scale. If mean 
score is employed as a criterion, items two, seven, nine, twelve, four- 
teen, seventeen and nineteen should have approximately the same 
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weighting. The scale of value weightings of these items, however, 
vary from .8 to 3.98. A similar relation is found for items five, eight, 
eleven, thirteen and sixteen. The mean increase in scale values from 
the most pacifistic to the most militaristic is .57, whereas, the mean 
increase in mean scores of those checking each item is .14. Thus, the 
mean increase in scale values is about four times as great as the mean 
increase in mean scores. The range of the scale values is from .2 to 
11.0 and of the mean scores from only 3.29 to 6.03. 


CONCLUSIONS 


The results show the following: 

1. There are large variations in the numbers of subjects checking 
each item. This is true even for items which have approximately the 
same scale values. 

2. Some items are checked by both the pacifistic and the militaristic 
members of the group. This of course must be the case since some 
items were checked by 90 per cent or more of the subjects. Half of all 
the items were checked by more than two thirds of the group. 

3. The average number of items checked is 11 or more than half of 
the items of the scale. But four per cent check fewer than eight items. 

4. Most subjects check items which vary widely in scale value. 
The mean range of the scale values of the items checked is 7.2 scale 
units. The range of the entire scale is 10.8 units. This result is 
similar for those at the militaristic as well as for those at the pacifistic 
end of the scale. 

5. The items form a graded scale from the point of view of both 
scale values and mean scores. The rank order correlation coefficient is 
.97. This value, however, is somewhat misleading as the next point 
reveals. 

6. Several groups of items with widely varying scale values have 
approximately equal mean score values. These items are of little 
value in discriminating levels of attitude. 

7. The mean score values show an average increase of .14 as com- 
pared with .57 for the scale values. Thus, the scale has a range in 
mean scores but one fourth that of the scale values. 

8. In future scale construction, it would seem essential if such 
instruments are to be made more valid that the scale values assigned by 
judges should be regarded as tentative and that these should be checked 
against the scores of large voting populations. The application of such 
principles as those suggested above should result in a scale refined in its 
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H 
cf i capacity to discriminate levels of attitude and one which contains I 
4 i fewer functionless items. Items which function in a negative manner 

‘ could easily be detected and eliminated. 


The above results indicate a need for a critical evaluation of new 
instruments. Workers in applied fields who employ new scales should 
first investigate the scale before employing it for individual or group 
measurement. Only in this way can we avoid the errors so prevalent 
in the early vears of intelligence testing. 
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ILLUMINATION AND THE HYGIENE OF READING* 


MILES A. TINKER 


University of Minnesota 


An adequate discussion of the hygiene of reading should include 
consideration of all factors which may influence comfortable, healthful 
and efficient functioning of the eye in the reading situation. It is 
obvious that illumination is one of the more important of these factors. 
The marked increase in the use of artificial illumination since the turn 
of the century has placed upon illuminating engineers the important 
task of prescribing hygienic lighting. Lighting practice, however, is 
still far from ideal. Eyestrain, with the resulting reflex functional 
disturbances of other organs, and sometimes defective vision is held 
to be directly traceable to faulty illumination. Economy and artistic 
effect are of purely secondary importance in comparison with main- 
taining healthful working conditions for the eye and in comparison 
with the conservation of eyesight. 

The progress of illuminating engineering has been notable. Too 
frequently, however, work on the problem of lighting has been confined 
principally to the source of light. The engineer has directed his efforts 
toward maximum output of light for a given amount of energy. 
Because of this too little emphasis has been placed upon efficient and 
comfortable functioning of the eye in relation to the lighting. Illumi- 
nating efficiency cannot be divorced from visual function, for the 
injurious effects of unhygienic illumination are manifested by impair- 
ment of visual function, by discomfort, or by both. Although recovery 
from such impairment is usually rapid, repeated exposure to faulty 
illumination may result in permanent injury to vision. 

There are now sufficient data from investigations by engineers, 
physiologists, and psychologists to furnish an adequate basis for 
hygienic illumination in the reading situation. We plan to give a 
generalized statement of these findings. The following phases of 
lighting will be considered: (1) Variation in the intensity or brightness 
of light; (2) wave-length or the color of light; and (3) distribution of 
illumination which includes various forms of glare and brightness 
contrast. 





* The material presented in this paper is based upon experimental results. 
The references cited, however, are selected and representative rather than 
exhaustive. 
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INTENSITY 


Both extremes of illumination intensity may be unhygienic under 
present lighting practice. While too dim a light leads to eyestrain, a 
very bright light tends to produce uncomfortable scotomatic glare. 
Most of the investigators have studied the influence of intensity upon 
visual acuity or discrimination of fine details.24 When the intensity 
is increased, the rise in visual acuity is rapid up to an illumination of 
about five foot-candles (a foot-candle is the light intensity of a standard 
candle at the distance of one foot). As the intensity is increased 
further the rise in acuity becomes progressively slower and practically 
reaches a maximum at about twenty foot-candles. Improvement 
in the later stages is scarcely noticeable. The varying effects of 
intensity increase at different levels of brightness is revealed by the 
67.7 per cent increase in acuity from 0.1 to 1.0 foot-candle, the 43.6 per 
cent increase from 1.0 to 5.0 foot-candles and the mere 8.2 per cent 
increase from 5.0 to 20.0 foot-candles. From twenty to one hundred 
foot-candles the increase in acuity is so slight that it is hardly 
noticeable. 

There are several factors which modify this relation between visual 
acuity and degree of illumination: (1) Increasing the light intensity 
produces a greater improvement in ability to discriminate details 
for the defective eye than for the normal eye.!”!%1425 With bright 
illumination the functional power of the subnormal eye is nearer to 
that of the normal eye than in dimmer light. Adequate intensity 
of illumination, therefore, achieves an increased importance for 
readers with even minor defects of vision. Although the wearing of 
glasses to correct the defects is important, this does not eliminate the 
need of somewhat brighter lighting. Similarly, the middle-aged 
and the old eye appear to need a brighter light than the young eye to 
function with the same degree of efficiency. (2) The benefits of 
increased intensity of illumination are greater when the object to be 
discriminated is small than when it is large. This difference is very 
marked at the lower intensities. (3) When there is a smaller amount of 
brightness contrast between the object to be perceived and its back- 
ground, power of visual discrimination improves much more rapidly 
with increases in intensity of light.’*?* The same would be true for 
any other print of less than optimum legibility. 

In some instances the results of these studies on visual acuity are 
not readily applicable to the normal reading situation. However, the 
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relation of illumination intensity to speed of reading and to impairment 
of visual function from extended application of the eyes in reading 
pertain directly to the hygiene of reading. Speed of silent reading 
increases rapidly with lighting intensity up to about one foot-candle 
and then gradually up to approximately ten foot-candles. Above 
ten foot-candles no increase in speed occurs. With print of poor 
legibility, as Old English type, increases in illumination brightness 
are relatively more effective in increasing rate of reading and the 
increases extend over a wider range of intensities. 

Decrease in ability to sustain clear seeing (fatigue) due to con- 
tinuous reading for three hours under diffused artificial illumination is 
marked for intensities of less than one foot-candle but practically 
disappears at three foot-candles and above.%* With well diffused 
daylight illumination, reading in a little less than one foot-candle of 
light fails to produce more fatigue than higher intensities although 
it does reduce the speed of reading.'!! (When the light distribution 
is not controlled the results, as we shall see later, are different.) 

In other practical situations similar trends have been discovered. 
There occurred no change in efficiency in a short number-work test 
when the illumination was increased by steps from 9.6 to 118 foot- 
candles.' In another experiment, letter sorting by postoffice employ- 
ees reached its maximum speed only when the illumination was at 
least eight foot-candles. The increase in efficiency was marked from 
two to five foot-candles, and only slight from five to eight or ten.**”" 

Recommendations have been made for intensities of illumination 
desirable in various reading situations. For classrooms and libraries 
the suggestions range from three to ten foot-candles; for drafting 
rooms, five to twenty; for proof-reading, six to twelve. It is generally 
accepted that five to eight foot-candles are satisfactory in the ordinary 
reading situation.*:"* If fine details are to be discriminated the light 
should be considerably brighter, especially if there is less than optimum 
brightness between the object to be discriminated and its background. 
The extremely high intensities advocated by Luckiesh,?? however, 
should not be accepted as valid for he has failed to coordinate intensity 
with the extremely important factor of distribution which is frequently 
unsatisfactory in ordinary working situations. The most hygienic 
intensity of illumination for any reading situation, however, depends 
upon the system of lighting employed (see distribution below). With 
the best distribution achieved in the ordinary living room or office the 
illumination intensity should not go above about 15 foot-candles. 
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Intensities considered most comfortable for continuous readings were 
found to be about six foot-candles for large type and about thirteen 
for medium sized type.** 


COLOR OR WAVE-LENGTH 


The spectrum is composed of a continuous series of colors. At one 
end are the violet and blue colors, in the middle are yellow and green, 
and at the other end is red. Light rays derived from a limited range 
of the spectrum are relatively uniform in wave-length and produce an 
approximately pure spectral color. Purity of light affects clearness 
of the visual image through chromatic aberration. This is due to the 
fact that the lens of the eye has a slightly different focal length for each 
color or wave-length. Thus for very near vision, it is possible to focus 
for violet rays, but very difficult to focus for red rays. In far vision 
the reverse is true. For intermediate distances it is easiest to focus for 
yellow rays. When the object is illuminated by mixed wave-lengths 
as in ordinary lighting, we have chromatic aberration in which the 
violet rays come to a focus farther to the front of the eye than the red 
rays, and the yellow rays occupy an intermediate position.? These 
intermediate rays are automatically focused upon the retina while 
the focal point of the violet rays lies in front and that of the red 
rays behind it. This results in aberration circles for the red and 
violet which produces a slight blurring of the optical image, and con- 
sequently a reduction in visual acuity. This blurring is not noticeable 
in ordinary circumstances because the brightness of the yellow rays 
is very much greater than that of the violet or the red.* Consequently 
the effects of the yellow predominates. We might expect, however, 
that vision would be clearer if all light rays but those from one region 
in the spectrum were eliminated. That is, the more nearly mono- 
chromatic a light, the sharper the image on the retina. 

Experimental evidence does show that the more nearly we get to 
monochromatic light the greater is visual acuity. Mercury arc light, 
which has a relatively homogeneous wave-length, produces greater 
visual acuity than tungsten light. For discrimination of fine detail, 
therefore, a monochromatic illumination is superior to light composed 
of mixed wave-lengths such as that from a tungsten lamp.'??! 
Nevertheless, if the reader is permitted to choose the intensity which 





* The other wave-lengths (blue, etc.) will, of course, produce aberration circles 
when the yellow rays are in focus on the retina. 
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he considers most comfortable for reading, he will choose a higher 
intensity for the mercury arc than for the tungsten lamp.”* 

Visual acuity varies from one monochromatic (colored) light to 
another. One careful determination found the order yielding greatest 
to least acuity to be: Yellow, yellow-green, orange, green, red, blue- 
green and blue.'*"* White light composed of rays from all parts of the 
spectrum was superior to the yellow when the test object was black on a 
white background. In practice, therefore, yellow is to be preferred to 
any other color or combination of colors (found in ordinary illuminants) 
for discriminating fine detail. But apparently no color is superior to 
sunlight. There is ample evidence, however, that these findings do not 
apply to most ordinary reading situations. It is practically impossible 
to obtain pure spectral lights of sufficiently high intensities for practical 
use. Furthermore, with legible print and adequate intensity of light 
there appears to be no appreciable disadvantage from chromatic 
abberation in reading with the common illuminants. One is about 
as good asanother. There is, for instance, no sound basis for the belief 
that the kerosene flame yields a more hygienic light than more modern 
types of illumination. And reading is just as fast in light from a 
tungsten lamp as from a mercury arc.‘ 

It has been widely believed that a mixture of daylight and artificial 
light is not desirable. As is to be expected from our discussion of color 
and composition of light, daylight furnishes a more efficient illumina- 
tion for discriminating detail than a mixture of artificial and daylight, 
and artificial light alone is less efficient than either daylight or the 
mixture. Nevertheless, during the late afternoon hours when the 
transition from daylight to dark occurs, a considerably greater inten- 
sity of the light mixture (artificial plus what is left of the daylight) is 
needed for satisfactory efficiency of vision than later when the illumina- 
tion is all artificial. There is, however, nothing especially deleterious 
in the mixed light. In this situation the apparently poor efficiency 
of the mixed light is due mainly to a changing condition within the eye 
itself. As is well known the eye is most efficient when fully adapted 
to the illumination used. During the day the eye has become adapted 
to bright light. Then in the late afternoon the onset of darkness is 
so rapid that adaptation of the eye can not keep pace with the change. 
It is this lack of prompt adaptation to the failing illumination rather 
than the mixture of the two types of light that produces less efficient 
vision and unpleasant subjective effects. Both of these may be 
lessened by introducing a fairly high intensity of artificial illumination 
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while good daylight is still present. This permits a more gradual 
transition from full daylight to complete artificial illumination.” 


DISTRIBUTION OF ILLUMINATION 


In practice, the most fundamental aspect of hygienic illumination, 
and the one most frequently inadequate, is the distribution of light 
in the field of vision. Failure to maintain conditions of distribution 
based upon well established principles results in reduced visual 
efficiency, fatigue, and discomfort of the eyes. 

The influence of lateral illumination on reading efficiency is a 
distribution factor of high importance. Everyone is familiar with 
the disagreeable effects of a bright side light shining into the eyes 
while reading or doing other fine visual work. It has been shown that 
visual acuity is progressively lowered as the lateral source of light 
becomes brighter and is moved closer to the reading page or the 
immediate working surface. When the reading page is more dimly 
illuminated than the peripheral light intensity, the adverse effect on 
vision is most marked. If the reading page is very bright, however, 
visual discrimination may be improved by lateral illumination. Also 
equal brightness of the reading page and the side light (at ten degrees 
distance) leaves visual acuity unaltered when the side illumination 
is added.® 

There are, in this situation, other factors than immediate visual 
acuity to be considered, however. Continuous reading with several 
sources of light in the field of vision results in discomfort and reduced 
efficiency. Moreover, the degree of loss in efficiency appears to be 
directly related to the number of light sources present in the visual 
field; the more lights, the greater the loss. 

This factor of number of light sources in peripheral vision is inti- 
mately related to the efficiency of each system of lighting. There 
are three systems of artificial lighting: (1) In the direct system the light 
is sent directly to the working surface, usually from an open-faced 
reflector. This results in a concentration of light on the working 
plane, usually with a marked sacrifice of even illumination since 
brightness extremes in the field of vision are emphasized. Frequently 
the eye is not properly shielded from this primary source of light. 
(2) The source of light, in the indirect system of illumination is entirely 
concealed from the eye. The light is first directed against the ceiling 
or other surface from which it is diffusely reflected throughout the 
reading room. (3) The semi-indirect system represents a compromise 
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between the direct and the indirect systems. Part of the light is 
transmitted through underhanging translucent reflectors of varying 
density directly to the reading surface, and part shines against the 
ceiling and is then reflected downward to the working plane.* 

In these systems even distribution of light is best achieved by the 
indirect system. Nevertheless, the semi-indirect systems, in which 
only a small portion of the light is direct, due to rather dense reflectors, 
are nearly as good. With translucent reflectors of less density, how- 
ever, distribution is more uneven. And with the direct system the 
distribution is, of course, least even. 

At the end of a three to four-hour period of reading under these 
systems of lighting, greatest loss of efficiency (fatigue) occurs with the 
direct, next greatest with the semi-indirect, and least of all with the 
indirect system which is nearly as good as diffuse daylight where there 
is practically no loss. The difference in loss of efliciency under the 
three systems is intimately related to the number of light sources 
(lateral illumination) in the field of vision. For example, when there 
are six lights in the visual field, the direct and semi-indirect systems 
produce a great deal more fatigue than the direct system. With 
four lights the differences are less marked, with two still less, and when 
no light fixtures are in view there is comparatively little difference.® 

This loss of efficiency sustained by the eye in the presence of lateral 
illumination seems to be muscular rather than retinal. This is due 
to the fact that the muscles are subjected to « greatly added strain 
by the side lights. Stimulation of the peripheral portions of the 
retina, which are extremely sensitive to bright lights, arouses a reflex 
tendency to fixate these side lights rather than the printed page. 
Added to this there is a strong incentive to fixate and to accommodate 
for these lateral spots of light, all at different distances from the eye. 
There must be constant action of the antagonistic muscles to inhibit 
these tendencies. The result is excessive strain which is manifested 
by loss of efficiency. '* 

Another distribution factor of prime importance is contrast within 
the relatively small field where critical vision is required. When the 
eyes, in reading a printed page or in discriminating fine details, are 
required to shift from bright to dark areas and vice versa, there is 





* The intensities of illumination recommended for this system are usually 
too high to maintain maximum efficiency during continuous reading.’ About 
2.5 foot-candles are better than five. In the direct system the intensity should 
be even lower if serious fatigue is to be avoided. 
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constant re-adaptation to the different light intensities. This soon 
results in muscular strain which is reflected in lessened visual efficiency. 
When the demarcation between the light and dark areas is sharp the 
fatigue effects are enhanced. Marked contrasts of brightness should 
be avoided whether in the immediate working field or in peripheral 
vision. 4 

The relative brightness of the background immediately surrounding 
the reading page is also a factor to be considered. It is well known 
that visual discrimination is relatively poor when one attempts to 
discern objects or words in a small, dimly illuminated area surrounded 
by a highly illuminated field. It is equally undesirable to have the 
small working field bright and the surroundings so dark that decided 
marginal contrasts in brightness are present. With a relatively large 
working field, however, the illumination of the surroundings is of less 
importance as a disturber of visual precision. Nevertheless, wide 
extremes of contrast between the two fields should be avoided because 
of the fatigue effects of brightness contrast discussed above.® 

Another phase of light distribution is surface reflection which is 
either specular or diffuse. In specular reflection a ray of light leaves 
the reading surface at an angle equal to the angle of incidence. Reflec- 
tion of light from the surface of a mirror is specular. The light rays 
in diffuse reflection are reflected in all directions from each point on 
the working plane as if the reading surface itself were a source of light. 
The specular reflection occurs from polished or glazed surfaces, the 
diffuse from mat or rough surfaces. Diffuse reflection alone yields 
hygienic illumination for reading. Light specularly reflected does 
not reach the eye at all, or is harmful to vision on entering the eye 
because of the glare* produced. The maximum visual disturbance 
arising from specular reflection occurs when a direct beam of light 
strikes a glazed surface which is viewed from approximately the same 
angle as that from which the light arrives at the surface. If the 
illumination is adequately diffused before it reaches the reading page, 
little glare results even if that surface is glazed or shiny. It is obvious 
that the amount of glaze permissible in printing paper depends largely 





* Glare is any brightness in the field of vision of such a character as to cause 
annoyance, strain, or reduced visual acuity. It may arise from excessive brightness 
of the light source, from excessive brightness contrast in the field of view, from 
reflection from polished and shiny surfaces in the field of vision, or from various 
kinds of lateral illumination. 
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upon the system of illumination (direct or indirect) to be used for 
reading. '4 

We now know that the eye needs protection from primary sources 
of light. This is accomplished by either lamp shades or by eye shades. 
We have already mentioned the relation of the former to systems of 
lighting. For the latter, an opaque eye shade with a white lining 
has proved to be best. Shades with dark linings and translucent 
shades dense enough to protect the eyes adequately from the light 
source produce harmful brightness differences in the field of vision. 
To render the whole upper half of the visual field dark in sharp con- 
trast with the brilliantly illuminated lower half creates a very unnat- 
ural brightness relation which enhances glare effects. It has been 
shown, furthermore, that there is no eye shade that produces as 
efficient vision as that achieved under the indirect system of lighting. 
To produce the best reading illumination the shade should be put 
on the lamp rather than on the eyes.® 


SUMMARY 


The three aspects of illumination considered in relation to the 
hygiene of reading are intensity, wave-length and distribution of 
light. When discrimination of rather fine detail is required in reading, 
a minimum intensity of about eight to ten foot-candles is essential. 
Higher intensities are sometimes advisable. In ordinary reading, 
however, a wide range of intensities may be employed without dis- 
comfort or fatigue although the speed of reading may become less 
at the lower intensities. If distribution is adequate, reading may be 
safely done under illuminations of from two to forty foot-candles. 
In most situations where the illumination is semi-indirect or direct, a 
brightness of two to three foot-candles is better than higher inten- 
sities. With highly diffused light as in the indirect system, five to 
ten foot-candles may be considered adequate for ordinary reading. 
The average reader, however, prefers a somewhat brighter light if it 
is available. 

It should be emphasized that control of light intensity alone can 
not produce hygienic illumination. Brightness of the illumination 
must always be considered in relation to the condition of the eyes, 
legibility of the print, and distribution of the light. When the eye 
has a refractive error, when the reader is old, and when the copy is of 
less than optimum legibility, the illumination should be made brighter. 
In all situations where indirect (diffused) lighting is not employed, 
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however, the lighting should be relatively dimmer in order to maintain 
the most healthful conditions possible for reading. This means that 
much of our reading should be done in relatively dim light (about 
three foot-candles) since most lighting is direct or semi-indirect. The 
tendency is to err on the side of too intense illumination. 

Where very fine discrimination is required in visual work, a light 
of relatively homogeneous wave-length yields clearer vision than the 
mixed wave-lengths in ordinary lighting. Of the monochromatic or 
colored lights, yellow is best and blue is poorest for fine discrimination. 
Sunlight, however, is better than any monochromatic light. In 
practice these findings can not be applied directly to most reading 
situations where fine discrimination occurs less frequently than in 
special visual tasks. The ordinary artificial illuminants, which have 
a mixed wave-length, yield as efficient reading as monochromatic light. 
Nevertheless, we do find that sunlight is a more effective illuminant 
than any color, and also that it is preferred over any artificial light. 

Distribution is the most fundamental aspect of hygienic illumina- 
tion and also the aspect which is least well controlled in artificial light- 
ing. The distribution of illumination is most adequate when the 
evenness of illumination, the evenness of brightness at the working 
surface, and the diffusion of the light is at a maximum. The types of 
light distribution from most to least effective in promoting hygienic 
functioning of the eye in reading are: Diffuse daylight, indirect system, 
semi-indirect system, and direct system. In the latter two of these 
systems the effects of lateral illumination become progressively worse 
as the number of light sources in the field of vision increases. Marked 
contrasts in surface brightness within the immediate working surface 
or in peripheral vision should be avoided. 

The maintenance of efficient visual functioning, and in some cases 
the preservation of normal eyesight is conditioned upon illumination 
which fulfills the minimum requirements of hygienic lighting. Too 
great emphasis can not be placed upon the proper coordination of 
intensity and distribution factors in order to achieve the most hygienic 
illumination possible in any particular situation. 
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HUMAN DRIVES 


PERCIVAL M. SYMONDS 
Teachers College, Columbia University 


This paper is concerned with a restatement of the problem of the 
drives or urges at work within the human organism which make the 
individual go, in the first place, and make him go in one direction 
rather than another, in the second. This problem is perennially 
important in education. All learning must take place in response 
to the demands of the organism. These drives are not to be brooked, 
and when normal satisfactions are not easily obtainable an individual 
is driven to abnormal or pathological behavior. A knowledge of what 
drives are fundamental is essential, then, for an understanding of what 
adjustments an individual must make, and why certain kinds of 
adjustments frequently are made. 

As a term applied to human beings instinct has been practically 
defunct in psychological discussions of motivation for the past decade. 
The theory of motivation based on the concept of instinct was first 
formally stated by William James in a series of magazine articles! 
in 1887 and later in his two volume work on Psychology.” James, 
however, received inspiration and suggestions from two earlier writers 
on instinct, one, W, Preyer,* who in 1881 wrote a book on develop- 
mental psychology based largely on a complete diary of his own son 
kept from birth to the end of his third year, and the other, G. H. 
Schneider’s‘ more theoretical works on animal and human impulses. 

Although the concept of instinct has been vaguely traced to the 
Greeks and to mediaeval scholars’ the real impetus to the modern 





1 James, W.: ‘‘What is an Instinct?’’ Scribner’s Magazine, Vol. XXX, 1887, 
pp. 433-451. 

James, W.: ‘“‘Some Human Instincts.” Popular Science Monthly, Vol. 
XXXI, 1887, pp. 160-170, 666-681. 

2 James, W.: Psychology IJ. Chapter XXIV, Henry Holt & Co., 1890. 

3 Preyer, W.: Die Seele des Kindes. 1881, 9th edition, Lepsic, 1923, translated 
in English by H. W. Brown and published under the titles The Senses and the 
Will. D. Appleton Company, 1888, and The Development of the Intellect. D. 
Appleton Company, 1889. 

4Schneider, G. H.: Der Threrische Wille, 1880, and Des Menschliche Wille, 
1882. 

5See article by Bernard, L. L.: Instinct. in Vol. VIII of Encyclopedia of 
the Social Sciences, The Macmillan Co., 1932; also Wilm, E. C.: Theories of 
Instinct; a Study in the History of Psychology. Yale University Press, 1925. 
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formulation came from the work of the great biologists of the nine- 

teenth century, particularly Darwin, (Preyer was a follower of Darwin), 

Herbert Spencer, Lloyd Morgan, and others, and the modern concept 

of instinct may be said to have developed from the biological study of. 
animals on the one hand, and the physiological study of the nervous 

system and basic organic reactions on the other. 

Thorndike, a student of James, elaborated James’ theory of instinct, 
and applied it definitely to educational theory.! Thorndike in his 
Human Nature Club gave an early statement of the point of view which 
he has since persistently held. 


We inherit certain connections between nerve-cells which make us act in certain 
circumstances in definite ways, without our learning how, or thinking about the 
matter *t all, or hearing what we are going to do. Our inherited constitution 
makes us breathe and suckle and smile and reach for things and walk and be 
afraid in the dark, just as it makes us sleep and digest food and grow. We call 
such unlearned activities, instincts, or native reactions. Such activities may appear 
before birth or at birth or be delayed till after birth. They may be transitory, 
that is, may stay for a while and then disappear if not exercised and rendered 
habitual. Some of them we have in common with a great many of the lower 
animals. Some of them are peculiar to the human race. On the basis of these 
instinctive acts develop all our later acquisitions. 


William McDougall has been another champion of instinct as the 
“essential springs or motive powers of all thought and action’’?(p. 20). 
McDougall defines instinct as an “innate specific tendency of the 
mind.” With dogmatic abruptness lé rejects the possibility of what 
he defines as instinctive actions being learned. 


Or what could be more strained and opposed to hundreds of familiar facts 
than Herbert Spencer’s doctrine that the emotion of fear provoked by any object 
consists in faint revivals, in some strange cluster, of ideas of all the pains suffered 
in the past upon contact with, or in the presence of, that object? (p. 45) 


Guided by the weight of these authorities instinct was accepted 
quite universally (passively, at least) as the basis of human motiva- 
tion by 1919. But the lists of specific instincts became longer and 


1 Thorndike, E. L.: Instinct. Fifth Biological Lecture from the Marine 
Biological Laboratory of Woods Hole. Ginn & Company, 1899. See also Thorn- 
dike, E. L.: The Human Nature Club. Chapter II, especially pp. 27, 28. Long- 
mans, Green & Co., 1900. The more elaborate and systematic presentation is 
to be found in his Educational Psychology. Vol. I, ‘‘The Original Nature of 
Man,” Bureau of Publications, Teachers College, 1913. 

2 McDougall, W.: An Introduction to Social Psychology. John W. Luce & Co., 
1908, 22nd edition, 1931. 
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longer and finally the elaborate theoretical structure broke beneath 
its own weight. The initial demolishing shot came from Knight 
Dunlap in a critical paper read before the American Psychological 
Association in Cambridge, Mass., December 29, 1919.! In this paper 
although Dunlap does not deny that “there is a great deal of instinc- 
tive activity, both conscious and unconscious, and probably both 
volitional and non-volitional,” he claimed that the recent classifica- 
tions of instincts had lost touch with physiological realities and that it 
were better not to refer to specific instincts. 

Certain sociologists,? approaching the problem of human motiva- 
tion from a fresh point of view, found the concept of instinct not 
sufficient for a proper interpretation of social behavior. Bernard* says 


The category of instinct, which serves very well the purposes of describing 
the activities of lower organisms, proves to be entirely inadequate for an account 
of human social behavior. Only habit and constantly and easily modified acquired 
reactions can serve his complicated and voluminous adjustment needs. (p. 10) 


and as a result of his survey of the problem Bernard states 


The instincts are_very early overlaid by acquired habits in the process of 
adapting the individual to his environment . . . the child who has reached a 
rational age is reacting in nine-tenths or ninety-nine one-hundredths of his char- 
acter directly to environment, and only in the slight residual fraction of his nature 
directly to instinct. (p. 524) 


In the meantime and apparently independently the attack on 
instinct was gathering force from within psychology itself. In an 
excellently written paper Kuo‘ proposed that the concept of instinct 
had no place whatever in the interpretation of human behavior, 
basing his argument on purely behavioristic grounds. After demon- 


1 Dunlap, K.: ‘‘Are There Any Instincts?” Journal of Abnormal and Social 
Psychology, Vol. XIV, Dec., 1919, pp. 307-311. See also his later paper ‘‘The 
Identity of Instinct and Habit.” Journal of Philosophy, Vol. XIX, 1922, pp. 
85-94. 

2 Bernard, L. L.: ‘‘The Misuse of Instinct in the Social Sciences.”” Psycho- 
logical Review, Vol. XXVIII, 1921, pp. 96-119; Faris, E.: ‘‘Are Instincts Data 
or Hypotheses?’ American Journal of Sociology, Vol. X XVII, 1921-1922, pp. 
184-196. McDougall, W.: ‘“‘Can Sociology and Social Psychology Dispense with 
Instincts?”’ and discussion by L. L. Bernard in American Journal of Sociology, 
Vol. X XIX, 1923-1924, pp. 657-673. 

? Bernard has given a thorough elaboration of his point of view in his book 
Instinct: A Study in Social Psychology. Henry Holt & Company, 1924. 

‘Kuo, Z. Y.: ‘‘Giving Up Instincts in Psychology.” Journal of Philosophy, 
Vol. XVIII, Nov. 24, 1921, pp. 645-664. See also Kuo’s later article: ‘‘How Are 
Our Instincts Acquired?” Psychological Revie», Vol. XXIX, 1922, pp. 344-365. 
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strating how many of the so-called instincts could have developed 
by the process of learning from more elementary acts or reflexes he 
states, 


If we watch the stages of development of human behavior closely enough, 
we shall not have any difficulty to trace the sources of social influences. To call 
an acquired trend of action an instinct is simply to confess an ignorance of the 
history of its development. (p. 650) 


Kuo gives as his positive theory of man’s native equipment as follows: 


The human infant is endowed with a great number of units of reaction. By 
units of reaction I mean the elementary acts out of which various coordinated 
activities of later life are organized. The reaction units are what we find in the 
child’s spontaneous activities and random acts. The new-born baby is charac- 
terized by being easily aroused to action; it is exceedingly active. . . . These 
reaction units are the elements out of which all the coordinated acts of the organ- 
ism are integrated. (pp. 658, 659) 


Kuo, while presenting no new evidence, illustrated the possibility of 
acts, formerly thought instinctive, to be the product of learning. 

Finally John B. Watson, on the basis of his experiments in condi- 
tioning infant emotions comes out with extreme statements doing 
away with instincts altogether. 


There are then for us no instincts—we no longer need the term in psychology. 
Everything we have been in the habit of calling an ‘‘instinct”’ today is a result 
largely of training—belong to man’s learned behavior. . . . (p. 74) The infant 
is a graduate student in the subject of learned responses (he is multitudinously 
conditioned) by the time behavior such as James describes—imitation, rivalry, 
cleanliness, and the other forms he lists—can be observed. (p. 104) 


(Watson at the same time goes to such absurd extremes as to claim 
that ‘‘there is no such thing as an inheritance of capacity, talent, 
temperament, mental constitution and characteristics.” ) 

Having reached this point in throwing over instincts, psychologists 
have been busy for a decade peering into infant behavior to discover 
and describe how it is possible for the manifold activities of a baby’s 
day to be learned. 


The question of what motivates human behavior can not be so 
easily disposed, however. James, Thorndike, and McDougall may 
have oversimplified the problem of human behavior by suggesting 





1 Watson, J. B.: Behaviorism. W. W. Norton & Company, 1924. Chapters 
5 and 6. 
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that it is built on an original structure of performed instructive 
behavior. But whatever the accuracy of their theories one is con- 
vinced by reading their descriptions of instinct that they are dealing 
with something so universal and with such a driving quality that it 
cannot be wholly subsumed under the guise of habits built on purely 
random activities. 

Tolman was one of the first to propose a substitute theory of 
“driving adjustment.”! Tolman’s theory in brief is that excitement 
of the smooth muscles of the visceral and organic systems of the body 
causes a change in adjustment or.disequilibrium which has a tendency 
of its own to seek equilibrium again. This change in adjustment in 
turn stimulates the somatic system of the body to activity which 


ee a 


reduction of the autonomic ¢ disequilibrium. ~ Any act of the somatic 
system is successful which provides new stimuli for neutralizing or 
opposing the autonomic adjustment, thus bringing it back to 
equilibrium. 

To give a familiar concrete illustration, hunger is known to consist 
of a series of rhythmical contractions of the smooth muscle wall of 
the stomach. This disequilibrium initiates restless searching move- 
ments until food is found and delivered to the stomach whence the 
muscle contraction waves cease and equilibrium is restored. 

Tolman apparently received help on his theory from four sources. 
W. Craig? had already formulated a theory of appetites and aversions 
as the constituents of instinct as a result of his observation of the sexual 
anomalies of male doves reared in isolation in which the variability of 
instincts was demonstrated. 

Woodworth’ had earlier elaborated a very similar theory in his 
ootares on Dynamic Psychology, although in those lectures the nature 
of a “‘consummatory reaction’’ was not very explicitly stated. 





1Tolman, E. C.: “‘Can Instincts be Given up in Psychology.” Journal of 
Abnormal and Social Psychology, Vol. XVII, 1922-1923, pp. 139-152. See also 
his earlier paper on ‘‘Instinct and Purpose.” Psychological Review, Vol. XXVII, 
1920, pp. 230. 

2 Craig, W.: ‘‘Male Doves Reared in Isolation.” Journal of Animal Behavior, 
Vol. XIV, 1914, pp. 121-133. 

Craig, W.: ‘“‘Appetites and Aversions as Constituents of Instincts.” Bio- 

logical Bulletin, Vol. XXXIV, 1918, pp. 91-107. 

? Woodworth, R. S.: Dynamic Psychology. New York, Columbia University 
Press, 1918. 
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A third source for Tolman’s theory was the work of his own 
teacher, Ralph Barton Perry,' Professor of Philosophy at Harvard, 
who in a series of articles indicated the possibility of an “automatic” 
explanation of adjustment. 

Finally Kempf? presented a theory of the autonomic nervoussystem 
which provided a framework for the physiological phases of the theory. 

Tolman’ shortly after preparing his theory of “driving adjustment ”’ 
elaborated the theory by preparing a list of the “fundamental drives.”’ 
Tolman describes drives into two main categories, the appetites and 
aversions. Appetites are initiated by internal physiological disturb- 
ances, rhythmical in nature. ‘‘When the rhythm reaches the proper 
part in its cycle, the organism becomes restless and embarks upon 
exploratory movements, until finally it came by chance,” upon the 
appropriate stimulus which is capable of reducing the disturbance. 
Appetites lead to seeking activity. 

Aversions, on the other hand, are drives to get away from external 
situations which cause physiological disturbances (injury, pain, 
physiological blocking, etc.). ‘In the case of the aversion the 
physiological inciting state tends to be a more enduring and constant 
affair.” Aversions lead to avoiding behavior. 

Tolman then proposed that drives originating directly from physio- 
logical disturbance be called first-order drives while acts which in the 
past have proven that they serve the direct physiological drives by 
helping to bring the organism in contact with the necessary reduction 
stimulus (or in the case of aversions, away from the irritating stimulus) 
be called second-order drives. Curiosity, for instance, would be a 
second-order drive, for it helps to bring distant objects within closer 
range, and hence may serve both hunger and sex, the first-order drives. 
In many respects these second-order drives are more like the aversions 
than like the appetites. 

In his recent book, Purposive Behavior in Animals and Men, 
Tolman‘ gives a list of fundamental drives derived from his mature 
consideration of the problem. 


1See Perry, R. B.: ‘‘Docility and Purposiveness.”’ Psychological Review, 
Vol. XXV, 1918, pp. 1-20. 

2Kempf, E. J.: The Autonomic Function of the Personality. New York: 
Nervous and Mental Disease Publishing Co., 1918. 

* Tolman, E. C.: “The Nature of the Fundamental Drives.” Journal of 
Abnormal Psychology & Social Psychology, Vol. XX, 1926, pp. 349-358. 

‘Tolman, E. C.: Purposive Behavior in Animals and Men. D. Appleton- 
Century Co., 1932. 
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APPETITES AVERSIONS SzconD-ORDER Drives 
Food-hunger. Fright (injury-avoid- Curiosity. 
Sex-hunger. ance). Gregariousness. 
Excretion-hungers. Pugnacity (interference- Self-assertion. 
Specific-contact-hungers. avoidance). Self-abasement. 
Rest-hunger. Imitativeness. 
Sensory-motor-hungers 

(i.e., the esthetic and 
play hunger). 


H. L. Hollingworth in his recent book on Educational Psychology 
has woven this point of view in a thorough-going discussion of human 
motives as applied to education. He says, 


Action was found to be the result of disturbance, stress, or upset of equi- 

librium. We may suppose that the so-called instinctive acts are also responses 
to some mental or bodily disturbance, to some irritant, which produces restlessness 
and activity until it is relieved. 
In the same chapter Hollingworth gives a list of twenty-three ‘dis- 
tresses” which he calls a “sample list of man’s original activities or 
fundamental adjustments.” In presenting such an extended list 
Hollingworth tends to lead back to the same overloaded system which 
caused the breakdown of the instinct theory. 


It is the purpose of this paper to explore the question of the funda- 
mental human urges or drives with the purpose of arriving at a formula- 
tion which is useful to educators who wish to understand the springs 
of human behavior. Certain wishes or urges or activities of a child 
may be so easily turned aside that it is evident that they are not very 
deeprooted. One may substitute (with a little tact) a peach for a pear 
or a toy automobile for a picture books without creating any pronounced 
disturbance. On the other hand, depriving a hungry child of food has 
unmistakable results. 

It is also evident that it is not possible or practicable in every case 
to go back to hunger, sex, thirst, desire for sleep, etc. as the urge of the 
moment. In most social activity it would require a considerable 
stretch of the imagination to make such an interpretation of the urge 
which is at work. 

As early as 1923 W. I. Thomas, a sociologist, influenced by the work 
of John B. Watson, attempted such a formulation. He says? (p. 3, 4) 





1 Hollingworth, H. L.: Educational Psychology. D. Appleton & Company, 


1933. 
2 Thomas, W. I.: The Unadjusted Girl. Little, Brown & Company, 1923. 
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We understand ... that. . , of emotion mean a preparation 
for actionfJwhich will be useful in preserving life (anger), avoiding death (fear), 
and in reproducing the species (love), but even if our knowledge of the nervous 
system of man were complete we could not read out of it all the concrete varieties 
of human experience. The variety of expressions of behavior is as great as the 
variety of situations arising in the external world, while the nervous system repre- 
sents only a general mechanism for action. We can, however, approach the 
problem of behavior through the study of the forces which impel to action, namely, 
the wishes, and we shall see that these correspond in general with the nervous 
mechanism. 

The human wishes have a great variety of concrete forms but are capable of 
the following general classification: 


7 
1. The desire for new experience. 


2. The desire for security. 
3. The desire for response. © 
4. The desire for recognition. 


Reading on, one finds that these are simply new terms for the 
tendencies expressing the following dominant urges (1) fighting 
(hunting), (2) fear, (3) sex, (4) ego, (food getting?). In short Thomas’ 
list presents in less conventional terms Tolman’s two aversions and the 
two most important appetites. It requires some stretching of the 
imagination, however, to see how these basic physiological drives 
carry over into the derived forms as described by Thomas’ four wishes. ! 

At the end of this chapter Thomas adds the significant statement, 


We may assume also that an individual cannot be called normal in which all 
the four types of wishes are not satisfied in some measure and in some form. 


Thomas’ list with two additions has been presented by Watson 
and Spence in their Educational Problems for Psychological Study.? 
They say, 


There are a few fundamental drives which lead to a restless series of behaviors 
designed to bring about a change from the unsatisfactory state of affairs in a 
given direction of adjustment. There are a thousand ways of satisfying hunger, 
and perhaps even more specific behaviorisms any one of which might be employed 
in satisfying an urge toward mastery and success. For understanding people, 
nothing seems more important than insight into the ways in which they are 
moved by these fundamental psychological pressures. Six major trends may be 
mentioned. 





1 An elaboration of Thomas’ theory of wishes is to be found in Folsom, J. K.: 
Social Psychology. chapter 4, Harper and Brothers, 1931. 
2? Watson, G. B. and Spence, R. B.: Educational Problems for Psychological 
Study. Appendix A, ‘‘The A BC of Educational Psychology.” The Macmillan 
Company, 1930. 
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1. Human beings tend to behave in ways making movement from physical 


deprivations (pain, hunger, sex demands, needs for sleep), toward physical well- 
being, euphemia. 

2. Human beings tend to behave in ways involving movement from failure, 
thwarting, disappointment, toward success, mastery and achievement. 

3. Human beings tend to behave in ways involving movement from being 
ignored or looked down upon, toward being looked up to, recognized, approved, 
admired. 

4. Human beings tend to behave in ways involving movement from being 
unwanted toward being loved and given intimacy, tenderness, and a sense of 
belonging. 

5. Human beings tend to behave in ways involving movement from being 
worried, anxious, fearful, toward release, security, and peace of mind. 

6. Human beings tend to behavé in ways involving movement from being 
bored, finding life dull and monotonous, toward adventure, new experience and 
zestful activity. 


One should note that four of Watson and Spence’s drives are 
identical with Thomas’. In addition, Watson and Spence have 
included the physical demands (which Thomas implies were subsumed 
in his more derived list) and success which Thomas does not mention, 
although it is so closely related to recognition that some have debated 
- whether one exists apart from the other. 

It should be noted that Spence and Watson have made each drive 
partake of the nature of both aversion and appetite, that is, tendency 
away from and toward. 


As a result of this survey I wish to propose a new formulation of the 
fundamental human urges or drives. There seems to be at least three 
distinct types which can be distinguished. 

1. The first type consists of descriptions or characterizations of 
the process of adjustment itself, looking at this process from various 
angles. For instance, adjustment is a drive toward the reduction of 
organic, visceral, or postural tensions and away from tissue destruction. 
The gestaltists have recognized this in their principle of closure. A 
dominant chord struck on the piano sets up tensions which require a 
tonic chord to restore the equilibrium. Any movement initiated in an 
attempt to reduce tensions set up has its own drive to completion. 
Regardless of the nature of the tension or of the process set in motion 
to relieve it there is a drive toward success. I have found in studies 
of the adjustment of school children that the drive toward success, 
whatever the nature of the activity, is the most potent of alldrives. In 
every individual there is a drive, strong or weak as it may be, to the 
successful completion of what one undertakes. 
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_ A second characterization of adjustment is a tendency to seek a 
situation and to repeat an act which in the past has relieved a tension 
or distress and has led to equilibrium. This is a tendency toward 
familiarity—the conservative tendency in men. Opposed to this is a 
‘third characteristic of adjustment—a tendency to random and restless 
behavior seeking for a situation and stimulus which will relieve the 
distress in case no familiar stimulus which has served successfully in the 
past is at hand. This leads to a tendency toward exploration. With 
the young and those with urgent drives the tendency is toward explora- 
tion, while the old and successful are conservative and tend to strive 
to retain the usual and the familiar. 

It is quite possible that there are, besides these three, other char- 
acteristics of adjustment in general apart from the particular type of 
tension or disturbance from which equilibrium is sought. 

2. The second type consists of the first order appetites and aver- 
sions which are groups of visceral or organic disturbances or tensions to 
be relieved by appropriate stimuli. Here I am quite willing to follow 
Tolman’s list. 


3. The third type consists of derived drives—similar in character 





to Tolman’s second order drives—drives which have been learned or 


acquired in the process of discovering ways and means of satisfying 
the primary organic drives. It is here that the greatest differences are 
found in the lists proposed by psychologists. It should be obvious that 
there is nothing necessarily universal in the nature of these derived 
drives except as the conditions of nurture throw all infants into approxi- 
mately the same situations for the satisfactions of their primary drives. 
It is for this reason that I would criticize most lists of instincts and 
even the list of “‘primary distresses’” proposed by Hollingworth. 
For instance, Hollingworth mentions as a primary distress (p. 97) 
“nakedness, shame, feelings of low worth” leading to such behavior as 
“decoration and exhibition of person and belongings or attainments.” 
One has only to watch the entire unconcern of infants playing happily 
in a ‘“‘state of nature” or the equal unconcern of primitive tribes to 
realize that this is a learned distress and is not ‘‘primary.”’ It is true 
that maladjustments are found which seem to include a component of 
distress at nakedness, but such maladjustments can be understood only 
by going back to the really primary drives on which this sense of shame 
or feelings of inferiority have been built. 

In like manner we would object to listing “ugliness, discord, and 
lack of harmony in surroundings” as a “‘primary distress’’ leading to 
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such behavior as “decorative arrangement and disposition of materials 
provocative of relief (production of beauty)” when it is so evident that 
this tendency is learned in response to some other drive which is more 
fundamental. The urge toward beauty is by no means so universal, 
as evidenced by the millions of people who live in utterly mean and 
wretched surroundings, that it could be called primary. It develops 
too late in life to be a universal urge. 

These second-order derived drives, to be at all fundamental, must 
have learnings attached to the primary drives in the first few weeks or 
months of life, and under conditions which are practically universal 
for every human being. Such reactions are those learned in connection 
with the acts of feeding, urinating and defecating and in avoiding pain, 
all of which, in the helpless state of infancy, require the assistance of 
others, particularly the mother. Any list, therefore, which may be 
made of the “fundamental” second-order drives must be the personal 
choice of some individual. As the list is added to the drives become 
less and less universal and fundamental. There is no sharp dividing 
line between the second-order drives which may be counted on to 
operate in everyone and the very personal and specific wants of each 
individual. 

We will list here nine second-order drives which seem to be of 
fundamental importance in interpreting problems of human adjust- 
ment. One is the desire to be with or in the presence of other persons, 
the old instinct of gregariousness. If this drive has good reason for 
being first on the list, it is because the baby is by force attended by 
others from birth onwards as a necessary part of his existence. Prac- 
tically every satisfaction which he gains has associated with it the 
presence of another person. Little wonder, then, that the presence of 
other persons is demanded and enjoyed. 

A second derived drive is the desire for attention from other 
people, closely related to the first, but possessing elements of its own. 
The baby not only wants other persons around but wants them to 
respond to him by looking at him, by speaking to him, by fondling him, 
and the like. This demand for attention from others continues as the 
baby matures into childhood and is a potent drive all through life. 

Growing out of the last as a third drive is the desire for praise and 
approval. Not only does the baby want to be noticed but he has 
learned to crave certain kinds of notice such as smiles, friendly nods, 
pats and the whole complex behavior of sympathetic understanding, for 
these evidences of approval have occurred and have been associated 
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with occasions when he has satisfied his desires of feeding, etc. These 
experiences which we call evidences of approval of themselves probably 
cause nervous relaxation and a pleasant body tone because of their 
associations. Consequently, everyone has learned to do the things 
which bring on these tokens of esteem and admiration which are 
valued for their relaxing powers. The desire for approval, applause, 
praise, and the like, becomes one of the strongest driving forces which 
human beings respond to. 


Fenton! speaks of 


A certain satisfaction [in the infant], dim and unformed enough, to be sure, 
but none the less distinguishable, in producing results himself, not only for the 
sake of the particular result, but in part also for the sake of a rudimentary sense 
of power awakened in him by his own activity in causing the result. . . . Toward 
the end of the first year delight in doing things for himself becomes a motive for 
him, he must manage alone every activity that he could possibly achieve, and 
correspondingly as his abilities increased, his desire for independence and self- 
sufficiency waxed stronger, more compelling. 


This is something the same as Thorndike? had in mind when he 
said, 


To do something and have something happen as the consequence is, other 
things being equal, instinctively satisfying, whatever be done and whatever be 
the consequent happening. . . . For example, a baby likes not only to see a pile 
of blocks tumble or a wheel go around, but also to find the blocks tumbling when 
he hits them, or the wheel revolving when he pushes a spring. 


It is not necessary to assume that the satisfaction here is instinctive 
but that the relationship noticed is built up in situations which have 
their own satisfactions. It is quite possible that this simple drive is 
also the root of mastering behavior. Early in life we enjoy finding that 
we can control the actions of other people by what we say or do, 
especially when this gets us some other satisfaction which at the 
moment is desired. It is also possible that this simple drive is the 
origin of the drive toward the establishment of the self. Sooner or 
later, of course, this interest in and concern for the self acquires its 
own drive. So many of the adjustments of life, such as striving to win 
a game, are driven on by this tendency to exalt the self or maintain its 








1Fenton, J. C.: A Practical Psychology of Babyhood. Houghton, Mifflin Co., 
1925, p. 45. This is one of the more recent of a long series of recorded observa- 
tions of infant behavior of which the book by Preyer, previously referred to, is an 
early example. These studies give us our best evidence of the springs of human 
behavior. 


2 Thorndike, E. L.: The Original Nature of Man. Pp. 141, 142. 
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integrity. If one can be persuaded that “it does not make any 
difference to himself personally” he will immediately let down in his 
strivings to win or to excel others. These three separate drives— 
drive to be a cause, drive to mastery, and drive to maintain the self— 
may be different levels of a continuous development. 

Opposed to this drive is the need for protection. The infant 
finds relief, comfort and protection at its mother’s breast; later in his 
bed or his room; later the home affords a haven from the outside. 
There is a natural tendency to run to the mother or father for protec- 
tion. That the infant finds its first satisfactions in a place that is 
warm, soft, and associated with a person (the mother) is the start for 
a need, expressed throughout life, for protection, shelter, security. 
No form of shelter is as effective as that of belonging to a group which 
indicates that this drive is closely related to the gregarious tendency. 
Out of the first intimate relations with the mother develop a need 
for affection, intimacy and tenderness. 

Perhaps tor completeness sake, curiosity should be mentioned as a 
third primary second-order drive to take care of the variety of early 
learned reactions to objects in the outside world. There are, for 
instance, the reaching and grasping movements which aid in food 
getting, exploration with the eyes and manipulation with the hands, 
and later the manipulation of words and ideas. The extent to which 
any of these components of curiosity become urges in their own right 
depends on how well they serve to satisfy the fundamental organic 
urges. ; 

It should be noted how many of these drives are merely aspects 
of the mother-child relationship—the fact that the mother is present, 
that she gives her attention to the baby, that she responds to it 
approvingly, that he learns to control her, that she provides security 
and affection—and that these simple relationships become so firmly 
tied to the infants satisfactions that they become primary needs for the 
rest of life. These second-order drives are in a sense fundamental 
because by the very conditions imposed by infant nurture no infant 
can escape them. 

Postural tensions set up in the course of the pursuit of situations 
which can relieve fundamental urges themselves become drives, seeking 
relief, and easement.! Muscular “sets” aroused by some stimulus 











1 Taylor, W. S.: “‘Intentions as Urges.”’ Journal of Abnormal Psychology and 
Social Psychology, Vol. XX VII, Jan., March, 1933, pp. 441, 442. 
Curti, M. W.: Child Psychology. Longmans, Green & Co., 1931. Chapter 10. 
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which in the past has been associated with satisfying activity send 
repeated kinaesthetic nervous impulses which have driving force. 
Of these, ‘‘goal ideas” loom as being especially important. Presum- 
ably goal ideas are kept alive by unrelieved organic tensions which 
cause ends to persist in mind which experience has proven in the past 
to be satisfying or which the imagination promises to be satisfying. 
Naturally such postural states are too individual and too numerous to 
be listed or catalogued. 

In conclusion, then, three types or levels of driving forces or urges 
in the human being are recognized: 


I. Fundamental characteristic of adjustment. 
(a) Drive toward success. 
(b) Drive toward the familiar. 
(c) Drive toward new experiences. 
II. Appetites and aversions. (Tolman’s list.) 


Food-hunger. Fright (injury-avoidance). 
Sex-hunger. Pugnacity (interference-avoid- 
Excretion-hunger. ance). 

Specific contact-hunger. 

Rest-hunger. 

Sensory-motor-hunger. 


III. Derived drives. 
(a) Desire to be with other persons. 
(b) Desire for attention from other persons. 
(c) Desire for praise and approval. 
(d) Desire to be a cause. 
(e) Desire for mastery. 
(f) Desire to maintain the self. 
(g) Desire for security, protection. 
(h) Desire for affection, tenderness, intimacy, sense of belonging. 
(t) Curiosity (reaching, grasping, manipulation, exploration). 


These tendencies are present in every individual, but each individ- 
ual satisfies them uniquely according to his experiences. To under- 
stand a person’s adjustment one would do well in each case to inquire 
what situations or stimuli the individual has learned to require to 
satisfy these fundamental drives. 





THE EDUCATIONAL ATTAINMENT OF DELINQUENT 
BOYS 


HOWARD A. LANE AND PAUL A. WITTY 


Northwestern University 


Many causes of delinquency have been hypothesized and occa- 
sionally confidently announced. Investigations have cast doubt 
upon the validity of every surmise, and the search for causes still 
persists. Workers now recognize that no single characteristic can be 
considered tie cause of delinquent behavior} and they are attempting 
to analyze the matrix of contributing factors. In these analyses 
restricted mental ability and educational retardation have been found 
frequently, and their importance as contributing agencies has received 
emphasis. Recent studies have demonstrated somewhat conclusively 
that the retarded mental status is not the determining specter at one 
time so universally proclaimed. The average IQ of apprehended 
delinquents, now fairly well established, falls in the “‘dull normal” 
category of intelligence ratings. And it is generally conceded that the 
average intelligence quotients of all juvenile offenders would be some- 
what higher than that of the apprehended delinquents usually studied. 
Competent students generally aver that ‘delinquency is without 
marked relationship to intelligence.” 

Although many studies have been published concerning the mental 
status of the delinquent, few comprehensive investigations of the 
educational status of the youthful offender have been conceived and 
effected. However, in the search for the provenance of delinquent 
behavior, the réle of the school is repeatedly and increasingly empha- 
sized. The White House Conference Report upon The Delinquent 
considers the school to be the great social laboratory in which the 
potential offender may be identified and through which he may be 
rehabilitated. To date, the school has been exceedingly negligent 
in these respects, and many criticisms of its failures have been set 
forth. Even the popular magazines have carried excoriations of the 
school for its dilatoriness in identifying deviates and in providing 
reform. Public school workers are not wholly reprehensible, since 
we know little about the kind and direction of treatment which should 
be undertaken in essaying reform. 

Published studies which present educational test data show that the 
delinquent is decidedly retarded in terms of educational standards for 


695 


OE ete wink 





roses eee 


oS al _ 


- o- 











 — 


he a EE 6 es TE tg EN ERR! 


ee A 











696 The Journal of Educational Psychology 


his age and school grade. Typical among reports is that of Doll,' 
who found that only five per cent of the boys in a New Jersey institu- 
tion for young delinquents reached or exceeded the average score of 
public school boys of similar ages on educational tests. And Sullivan? 
reported that the average boy in the Whittier, California State School 
for Boys was retarded two years and five months in terms of the 
educational standard for his chronological age. 

Even more serious than this general educational retardation is the 
failure of delinquents to attain those barren educational goals which 
are in keeping with their mental ages. Apprehended delinquents 
typically display educational knowledge considerably below those 
levels which are consonant with prognostications of attainment based 
upon intelligence test results. Chase* tested one hundred sixty-five 
boys, who had been in the Whittier School for ten months or more. 
Their average educational age upon the Stanford Achievement Test 
was almost two years below their average mental age. The boys 
discussed by Sullivan‘ were thirteen months retarded in tefms of 
mental age. Other studies have corroborated, in general, the investi- 
gations just cited, showing that apprehended delinquents are markedly 
retarded educationally in terms both of their chronological and their 
mental ages. However, most of the investigators have dealt with 
relatively small number of cases; they have presented only frag- 
mentary data; and the results of educational tests have not been 
fully analyzed. 

In this paper the writers will set forth and analyze rather fully the 
educational attainment of approximately six hundred fifty boys in the 
St. Charles, Illinois, School for Boys. Educational attainment will 
be compared with chronological and mental age standards, and the 
specific kinds of school subject retardation will be given in detail. 


SOURCES OF DATA 


To all boys entering the St. Charles, Illinois, School for Boys 
(an institution for young delinquents) educational and mental tests 


1Doll, E. A.: “Education of juvenile delinquents.” Journal of Juvenile 
Research, Vol. VI, March, 1921, pp. 331-346. 

2 Sullivan, E. B.: ‘“‘Age, intelligence, and educational achievement of boys 
entering Whittier state school.” Journal of Delinquency, Vol. XI, March, 1927, 
pp. 23-38. 

3’ Chase, V. A.: ‘‘Educational achievement of delinquent boys.”’ Journal of 
Juvenile Research, Vol. XVI, July, 1932, pp. 189-192. 

4 Thid. 
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are administered. A unit of the Institute for Juvenile Research of 
the State of Illinois directs the testing. Results of the mental and 
educational testing of more than seven hundred boys (tested at the 
time of commitment) were made available to the writers. The 
Stanford Achievement Test was given to the boys by one of the writers 
after the boys had been at St. Charles for about nine months. 


GENERAL EDUCATIONAL STATUS 


When the tests were first given, the median chronological age of the 
boys was fourteen years, 5.6 months. The Otis Group Test of Mental 
Ability yielded a median mental age of twelve years, 9.4 months. 
The median IQ was 88.25; the interquartile range of IQ’s was 79.4 
to 97.0. Based upon the results of the Stanford Achievement test, the 
median educational age was eleven years and six months. Thus it 
appears that the educational attainment of the typical St. Charles 
boy is almost three years below the norm for boys of his chronological 
age, and one year and three months below that consonant with his 
mental age. 

The rate of growth in academic knowledges may be expressed by the 
educational quotient (EQ) which is derived by dividing the composite 
educational age by the chronological age. A child with an EQ of 
one hundred has presumably made educational progress equal to that 
of the typical child of his age. This assumption, of course, is based 
upon the acceptance of the validity of the examination, the adequacy 
of its norms, and its suitability for the child tested. The Stanford 
Achievement Test appears to have some obvious limitations in light 
of the assumptions given above, and its validity for use with delinquent 
children may be questioned. However, all children in St. Charles 
have had some formal education, and about three-fourths of them have 
been enrolled in public schools in urban communities. Therefore, 
the composite educational age may be considered fairly valid, and 
useful in making comparisons. The distribution of EQ’s of six 
hundred twenty-one boys is shown at the top of page 698. 

Comparison of the median EQ, 78.44, with the median IQ, 88.25, 
suggests the severity of the educational retardation. This is clearly 
illustrated by the fact that although 18.3 per cent of the boys have 
1Q’s over one hundred, only four per cent have EQ’s above average. 
A more exact expression of this type of comparison may be secured 
by computing accomplishment ratios. The accomplishment ratio, or 
the accomplishment quotient, was derived by dividing the EQ by 
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RES 79.36. Median*........ DE. MP csaceseuwsas 11.86. 


* These measures were calculated from a distribution table having intervals 
of five points. 


the 1Q. The distribution of AR’s for six hundred twenty-one boys 
follows: 
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The preceding tabulation suggests that these delinquent boys have 
not been motivated to use their ability fully, and that they have not 
exercised the effort necessary to yield appropriate educational gains. 
Only eleven per cent of the boys have accomplishment ratios above 
one hundred, while eighty-nine per cent fall below this average. 

Further evidence of the failure of the boys to make educational 
gains commensurate with their ability was obtained by correlating 
the AR’s with the IQ’s. A resultant coefficient of correlation (Pearson 
Product-Moment) of —.21 suggests a slight tendency for dull boys 
to use their abilities more successfully than the bright boys. Although 
only six per cent of the boys with IQ’s over one hundred exceeded the 
accomplishment ratio, one hundred, forty-six per cent of those with 
IQ’s below seventy achieved test scores higher than the averages for 
children of their mental ages. 
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ANALYSIS OF PARTS IN STANFORD ACHIEVEMENT TEST 


Significant differences appear in the performance of the delinquent 
boys on the parts of the Stanford Achievement Test. Table I gives 
the median, Q;, and Q;, for each of the sub-tests in the examination. 


TasLe I1.—Tue AcHIEVEMENT or Sixx HunprReEp Tarrry-srxx DELINQUENT Bors 
Upon THE STanrorp ACHIEVEMENT TEST 

















é Grade standing 
Subject 

Median Q: Q 
Paragraph meaning........ 5.7 7.3 4.4 
ns senna wa bebe ease os obhe on 6.0 7.9 4.8 
Dictation (epelling)..........ccccccccccccceese| 8.4 7.0 4.1 
eins 6 6b a eke du ebsne ebb 5oceds ab 5.1 6.9 3.8 
SE re a co 7.6 4.4 
en ica chcdbesenctdemkeedbeiia 6.1 PL 4.7 
ic: arn en ndee es ends sadneeccesdceul Sn 7.4 4.8 
ices pe ih Gunneuee seen e+e) sake 5.7 7.0 4.5 
I, vccccacsccesccscsceded 6.4 7.6 5.0 
Arithmetic computation.......................| 5.1 6.3 4.3 








It is of interest, and perhaps of significance, that the poorest 
attainment was elicited in those subject fields which require and 
demand drill in the typical school. Spelling, arithmetic computation, 
and language usage (weighted with grammar) are areas in which the 
typical school secures its products by formal drills which frequently 
are tedious, onerous, and distasteful. Noticeable is the relatively 
high achievement of the delinquent boy in arithmetic reasoning. 
In this test, there are many items which, the writers think, reflect 
acquisitions which are not to a marked degree the product of the 
special opportunity and drill which the school provides. 


EDUCATIONAL ACHIEVEMENT AT 8ST. CHARLES 


When this study was made the St. Charles institution maintained 
an elementary school organized in the usual grade divisions. The 
boys were placed in grades corresponding roughly to the level of their 
composite achievement upon the Stanford Achievement Test. It is 
doubtful that the academic department of an institution for malad- 
justed children should emphasize the type of work which has hitherto 
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failed to attract the children’s interest and elicit their effort. Never- 
theless, it will be of interest to discern whether decided gains in 
academic skills and knowledges result from such a program. With 
this in mind, five sections of the Stanford Achievement Test were 
administered to one hundred sixty-seven boys (not recidivists) under 
sixteen years of age who had been in the school for six months or more.' 
The results were compared with scores made by the boys at the time 
they entered the school. The sections of the test used were: Para- 
graph meaning, word meaning, language usage, arithmetic reasoning, 
and arithmetic computation. These boys, who had been in school 
an average of 8.9 months, were divided into three groups: Group A 
contained seventy-two boys with IQ’s above ninety-five; group B 
included forty-five boys with IQ’s eighty-five to ninety-five; in 
group C, there were fifty boys with IQ’s below eighty-five. Table II 
sets forth the results of the two examinations as well as the gains 
made in each sub-test. 

All groups gained decidedly during the interval between the test- 
ings. Furthermore, the average gains are approximately equal to 
the acquisitions of typical public school children during a comparable 
period of time. In arithmetic reasoning only is the obtained gain less 
than that expected of ‘‘normal”’ public school children during a similar 
length of time. This seems particularly significant in view of the fact 
that the St. Charles boys attend school only three hours daily. It is of 
interest that the smallest gain was made in the arithmetic reasoning 
test upon which the children were relatively high at the time of com- 
mitment. The gain in arithmetic computation, however, is large. 
Indeed, in nine months even the extremely retarded boys gained more 
than a year in arithmetic computation, but they progressed to the 
extent of one month only in arithmetic reasoning. 

These results should not be used unequivocally to attest to the 
unusual effectiveness of the academic program at the St. Charles 
School. A program which provides regular habits of living and 
working, with time and opportunity for reading, should produce a 
definite and unusual acceleration in the educational achievement of 
delinquent boys. The gains suggest achievements which the public 
school might attain with similar groups if facilities for child study and 
guidance, including adequate diagnosis an remedial teaching, were 
provided. 





1 The average IQ of this group was 92.2. 
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TasLe I].—Gains Maps sy Ong Hunprep Sixty-sEevEN DELINQUENT Boys On 
Five SEcTIONS OF THE STANFORD ACHIEVEMENT TEST 























Group | Group | Group | Total 
A B C group 
ES 6 ni i0usen nd dackbw deueae »aauk ee 90.1 78.2 92.2 
Ne ns a We au aed cin be 8.7 8.9 9.2 8.9 
Paragraph meaning. 
School grade on retest............... 8.9 7.0 5.6 7.5 
ut 6 6 aus 'nd onde ee. e 7.8 6.3 4.7 6.5 
Gain (school months).................| ll 7 y 10 
Word meaning. 
School grade on retest................. 9.1 7.7 6.1 7.9 
At commitment.............. 8.3 6.8 5.1 6.9 
Gain (school months). ...... 8 9 10 10 
Language usage. 
Grade on retest............. cet 7.8 6.7 4.7 6.6 
RS oo is bbe baeanlewews 7.1 5.6 4.1 5.8 
Gain (school months).................. 7 11 6 8 
Arithmetic reasoning. 
ied cw ce ss nwoe-04 de sh 8.7 7.3 5.4 7.4 
At commitment............. 7.8 6.8 5.3 6.8 
Gain (school months)......... 9 5 1 6 
Arithmetic computation. 
Gene om retest. .............. 8.6 7.3 5.7 7.3 
i es nik sk ee gk @ ale 6.8 5.9 4.5 5.9 
Gain (school months).................| 18 14 12 14 
Average of scores. 
Grade on retest.......................| 8.6 7.2 5.5 7.3 
I re er ree 7.6 6.3 4.7 6.4 
Gain (school months). ............-...| 10 9 8 9 





This study suggests the significant rdle which educational retarda- 
tion plays in the lives of delinquent boys. Although mental retarda- 
tion characterized the group, it was much less noticeable and grave 
than was retardation in educational growth (as measured by the 
Stanford Achievement Test). Nevertheless, in the St. Charles School, 
these boys demonstrated the capacity to profit by the instruction 
which was planned in accord with their mental and educational status. 
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These facts have obvious and important educational implications for 
the educator who desires to provide in his school the best conditions 
possible for all children. One may reasonably infer that the low (but 
improvable) educational status of delinquent children is an important 
element in producing discontent and in engendering a-social behavior. 
One may assume further that improvement in educational opportunity 
—with adaptation of materials and methods of instruction to individual 
differences in ability and interest—may ameliorate somewhat the 
conditions that foster maladjustment and contribute to delinquent 
behavior. 


ITEM VALUE AND TEST RELIABILITY 
UVAN HANDY AND THEODORE F. LENTZ 
Washington University 

Items are evaluated as an aid to test construction. Whatever 
be the method of evaluation and whether the process be called valida- 
tion of items or the selection of those items which yield highest reli- 
ability there is obtained a measure of the degree an item differentiates 
two groups. Items of high differentiating value should yield a higher 
reliability than those of low value. The present experimental verifi- 
cation of this most obvious assumption was not undertaken merely 
to see if it were true but to obtain some quantitative data concerning 
the relationship. Such data is important not only because it enables 
us to validate a method of item evaluation but because a knowledge 
of item values permits an estimation of reliability. For this esti- 
mation we propose a variation on the Spearman-Brown formula based 
on an empirically determined relation between item value and reli- 
ability as described in the present study. 

The foregoing considerations concerning item value and test 
reliability were investigated by means of the Lentz Social Science 
Opinionaire, Forms E & F. These are preliminary forms for the 
development of an instrument for the measurement of conservatism. 
The forms contain four hundred thirty-seven items which the subject 
is asked to mark plus or minus according to whether or not he agrees 
with the statements. Sample items are: 


163. Any science which conflicts with religious beliefs should be taught cau- 
tiously, if at all, in our schools. 

315. Socially-minded experts, rather than voters, should decide the otitis 
of government. 


The forms were given to five hundred eighty students from seven 
colleges. 

The responses were tabulated on forty-five column Hollerith cards. 
Each person was represented by a card and each item was assigned a 
definite position thereon. The card was punched for a “plus” 
response; a ‘‘minus” response was indicated by a blank. Omissions 
were not punched but were marked with a colored pencil. Since 
the omissions were a very small part of the nearly quarter million 
responses they were neglected in the scoring. All scoring was done 


by placing a key card over the data card. No items were weighted 
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throughout this experiment. Since the score on any group of items 
consisted of the number of plus responses to the conservative items 
added to the number of minus responses to the radical items the key 
was a colored card with the positions to be scored, punched. The 
conservative and radical items were differentiated by checking the 
radical items on the key or by covering the radical positions with a 
transparent colored paper. These details are included because where 
there is much scoring to be done scoring from cards is much cheaper 
than from the original papers. 

The entire set of cards were scored for all four hundred thirty-seven 
items. The items were then evaluated with the aid of an Hollerith 
machine according to the U-L method which was found best.! The 
number of plus responses of the highest one hundred seventy-four 
subjects to any conservative item minus the number of plus responses 
of the lowest one hundred seventy-four would be the value of the item. 
For a radical item of course the sign would be reversed. The 
highest possible item value would be one hundred seventy-four. The 
actual item values ranged from 105 to —38. For this study the best 
four hundred items were used. The lowest item value of those items 
used was three. In order to make the present values comparable to 
other data we shall express the raw value as a per cent of the highest 
possible value 7.e. the raw value divided by one hundred seventy-four. 
Thus, the value of our best item would be transformed from 105 to 
.603. 

The items were arranged in order of their item evaluation from 
best to worst. The items were then noted as odd or even. The best 
four hundred items were then divided into eight groups of fifty each, 
the best fifty, the second best fifty etc. The twenty-five odd and 
twenty-five even items of each fifty were comparable tests. The 
average item value as well as the range in value of the odd items in 
each fifty was practically identical with the average and range in 
item value of the even items in the same group. In each of the eight 
groups the odd items were correlated with the even items. 





1Lentz, Theodore F., Hirshstein, Bertha and Finch, F. H.: ‘Evaluation of 
Methods of Evaluating Items.” Journal of Educational Psychology, Vol. XXIII, 
May, 1932, pp. 344-350. 

In this study, however, the U-L fraction was changed from upper and lower 
thirds to upper and lower three-tenths. This change was prompted not by 
experimental evidence but to facilitate obtaining even fraction intervals in a 
later evaluation of fraction size. 
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The odd and even scores on the second best fifty items were added 
to the odd and even scores respectively of the best fifty items. Then 
to the scores on the best hundred were added the scores of the third 
fifty etc. Thus, cumulative odd and even test scores were obtained 
for each student at regular test length intervals of twenty-five vs. 
twenty-five items, fifty vs. fifty items etc. up to two hundred vs. two 
hundred items. The reliability of the test for each of these lengths 
was then determined. 

All correlations in this experiment were correlated after the Otis 
form of the Pearson product moment but each possible score was 
represented by a class interval and no grouping of different scores 
in the same class interval was permitted. The entire group of five 
hundred eighty subjects was used in obtaining all correlations herein 
mentioned. 


Tas_Le I.—SHOWING THE RELATION OF ITEM VALUE TO RELIABILITY 


























I II Ill IV V VI Vil 

25 vs. 25 U-L r= 
Group | uncorrected = = V ial IV —II| —.001945n | VI — II 

r PE + .8927 

1 .834 + .009 .443 .840 + .006 .844 + .010 

2 .743 + .013 .343 .730 — .013 747 + .003 

3 .684 + .015 . 282 .644 — .040 .650 — .034 

4 .519 + .021 . 226 .550 + .031 .552 + .033 

5 .433 + .023 .183 .467 + .034 .455 + .022 

6 .375 + .024 .137 . 368 — .007 .358 — .017 

7 .280 + .026 .095 . 267 — .013 .261 — .019 

8 .132 + .028 .049 .145 + .013 .163 + .031 





lr = 1] — (1 — v)*-117638, This relation is discussed in the next paragraph. 


Table I shows very clearly that a drop in item value is accom- 
panied by a drop in reliability. The relationship which is curvilinear 
cannot be well expressed by a correlation-ratio because of the few 
figures. The relationship can be expressed by an algebraic equation. 
In addition to the use of the observational data in the determination, 
by the method of least squares, of the constants for an empirical 
equation relating item value to test length we are justified in positing 
two rigorous conditions which the curve must satisfy. When the 
item value is zero the reliability must be zero. When the item value 
is unity the reliability must be unity. A least square solution for a 
test length of twenty-five items gives r = 1 — (1 — v)*127658 where r 
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is the reliability and v is the average item value. How well this 
equation fits the data may be seen from Table I. The observations 
are in column II, the curve values in column IV and their difference 
in column V. Since the items were arranged according to their value 
the reliability of an item or a group of items of similar value should 
be a function of the rank of the item or items in value. While this 
relationship is definite it is only a coincidence that for the present 
data and test length it can be expressed by a straight lin~ ‘orraula. 


A least square solution gives r = —.001945n + .8927 whe the 
reliability of twenty-five items whose average value is eq: he 
value of the nth best item, and n is the nth best item. es 
of this equation are given in Table I, column VI and the resi n 


column VII. When, for a given test length, the reliability has veen 
expressed as a function of item value or other variable the function 
may be substituted in the Spearman-Brown formula and provided 
comparable items are used in varying the test length the reliability 
of any test length may be expressed as a function of item value or other 


variable. Table I must be considered a strong recommendation for 
the U-L method. 


TasBuie II.—CumutLatTiveE RELIABILITY AND DECREASING ITEM VALUE 


























I II III IV V VI VII 
Test Cumulative Pr Average a 
length uncorrected | 7 | II —I1| item | 7. |VI-II 
r PE mn value nord 
25 vs. 25 .834 + .009 | .834 .000 .445 .840 | +.006 
50 vs. 50 | .885 + .008| .881 — .004 .394 .883 | —.002 
75 vs. 75 | .904 + .005| .904 .000 . 357 .901 — .003 
100 vs. 100 | .908 + .005| .901 — .007 .324 .905 | —.003 
125 vs. 125 .911 + .005| .899 — .012 . 296 .909 | —.002 
150 vs. 150 | .912 + .005| .897 — .015 . 269 .909 | —.003 
175 vs. 175 .901 + .005| .893 — .008 259 .918 | +.017 
200 vs. 200 | .892 + .006' .883 — .009 . 220 .902 | +.010 





1 Predicted by the Douglass-Cozens formula’ and discussed in the next 
paragraph. 

2 Predicted by a formula of the authors. It is based entirely on item values 
and is discussed below. 


In Table II column II the reliabilities and their probable errors 
are recorded for test lengths of from twenty-five vs. twenty-five items 
to two hundred vs. two hundred items. At every point the successive 
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reliabilities are lower than would have been predicted by the Spear- 
man-Brown formula from any of the preceding reliabilities. This lag 
and drop in reliability is caused by the addition of non-comparable 
items 7t.e. items of lower value and must not be interpreted as an 
indictment of the formula. It is to be taken as an example of the 
disappointment awaiting the naive who put all the best items in the 
first form of a test and construct the second from poorer items. In col- 
umn III are the cumulative reliabilities as estimated by the Douglass- 
Cozens! formula. 

In addition to the assumptions made by the authors of the formula 
the present writer assumed that the intercorrelation of each group of 
items with every other group of items would correlate to unity when 
corrected for attenuation. The average intercorrelation then reduces 
to the average of geometric means and thus intercorrelations are 
entirely eliminated. This assumption is justified because the batteries 
were constructed as comparable except as to item value. Further- 
more, if the assumption could not be made the formula would be 
convicted of underestimation. As it is the Douglass-Cozens formula 
gives an estimation of reliability which is very close to that obtained 
by experiment. The differences between estimated and observed 
values are given in column IV. In column VI are estimates of cumu- 
lative reliability based on the following formula of the present writers 
me Nii aim (1 =e gy) -117688) 
~ T+ W= Dl — (1 orien 


where v is the cumulative average item value and N is the test length 
in units of twenty-five items 7.e. N would be four for a hundred item 
test. The residuals are in column VII. This formula is a com- 
bination of the relation of item value to test reliability and the Spear- 
man-Brown formula. For the present data.the error of the formula 
of the writers is even less than that of the Douglass-Cozens formula. 
We propose that test reliability be estimated from item values. The 
form of the formula will of course vary with the method of item 
evaluation. 

Correlation, in any of its forms, and its attendent scoring of tests 
is an expensive process. The test constructor who must in any event 
evaluate items should consider well the relation of item value to test 





TN 





1 Douglass, H. R., and Cozens, F. W.: “‘On Formula for Estimating the Reliabil- 
ity of Test Batteries.” Journal of Educational Psychology, Vol. XX, May, 1929, 
pp. 369-377. 
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reliability as a means of its prediction. The method herein described 
is an attempt to predict reliability with much of the drudgery of inter- 
correlation eliminated and at the same time permit consideration of 
non-comparable items. Having drawn a curve showing the relation 
of test length to reliability for a particular group of items each test 
constructor must decide for himself what test length he can afford in 
view of the returns in terms of predicted reliability. There are many 
points along the lines of the present study which await investigation. 
The optimal U-L fraction is still unknown. A more rigorous ascer- 
tainment of optimal evaluation procedure should be a prerequisite 
to further experiments similar to the present one. It may be that 
if items were arranged in a frequency distribution according to their 
item value and a Pearson Type curve fitted to the distribution a 
general equation relating the frequency distribution to the reliability 
of the n best items could be obtained. The presence of multiple 
factors may or may not affect the accuracy of estimates by the pro- 
posed formula. In the development of the present theme there are 
many possibilities. 


BOOK REVIEWS 


LuEtLA Coie. Psychology of the Elementary School Subjects. New 
York: Farrar and Rinehart, 1934, pp. XVI + 330. 


H. C. Hives. Introduction to Educational Psychology. New York: 
D. Van Nostrand Co., 1934, pp. XXIX + 381. 


Thesé two books, the first detailed and almost diffuse, the second 
so condensed as to be little more than a summary, are alike in one 
respect—both are full of practical applications of the principles they 
discuss. In this respect they should be of great value in the actual 
teaching work of the classroom. 

Dr. Cole’s book on the Psychology of Elementary School Subjects 
breaks sharply with the tradition established by Reed, Pyle, and 
Garrison and Garrison. She has selected for discussion only those 
topics which can actually be used to increase teaching efficiency 
in the classroom. In one respect, therefore, it is a book on methods 
based on psychological research; in some places it even sinks to the 
level of special devices and tricks of the trade. Dr. Cole, as a con- 
sulting psychologist, has been frequently confronted with ‘‘ problem 
children”’ and the air of the clinic pervades the volume. The chapter 
entitled ‘‘ Interesting, Easy and Profitable Research,” is characteristic 
of the volume as a whole. The advice to study the vocabularies of 
readers, geographies, histories, etc. is sound as it is easy, but it is 
not research in the true sense of the term. Moreover, the individual 
teacher can do very little to remedy any of the defects she may dis- 
cover. The idea behind the writing of the chapter was perfectly 
sound, but somehow or other it seems to have gone wrong by trans- 
lation into practice. 

Professor Hine’s book is as condensed as Dr. Cole’s is diffuse. 
Truths become untruths because they are so tersely stated. The 
statements about the conditioned reflex, the endocrine glands and the 
nervous system are of this nature. But everything is docketed and 
numbered and the youthful student will be able to learn the book 
almost by heart. The book is eclectic in outlook and elementary in 
style, so no doubt it will fill the needs of the students for whom it 
is intended, namely, teachers in training in normal schools. 

PETER SANDIFORD. 
University of Toronto. 
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S. J. Hotmes. The Eugenic Predicament. New York: Harcourt, 
Brace & Co., 1933, pp. XI + 232. 


Educational psychologists will feel perfectly at home in this 
book. Though in the preface the author, Dr. S. J. Holmes, tells 
us that the purpose of the book is to disseminate the facts of heredity 
and eugenics, the actual content on the basis of which the author 
builds his case are facts that are much more familiar to students of 
educational psychology than to students of eugenics. By and large 
they are findings from the psychometric studies of the role of nature 
and nurture in the development of intellectual and non-intellectual 
traits. The chapters on ‘‘ The Legions of the Ill-Born”’ and “‘ Heredity 
of Superior Ability”’ are loaded with such content. The author, 
though not an educational psychologist, not only shows a familiarity 
with contributions from the field that are relevant for his purposes, but 
also shows an expert skill in the use of them for providing what he 
wants to prove. The general style of writing in this book gives one 
the impression of the author being a man who is too kindly and sane 
to be an alarmist. The book convinces. It is persuasive. Clarity 
and persuasiveness, however, are not always correlates of sound 
thinking. In this book they often are not. 

It is not fair to call the author an alarmist. He is not. But 
some of the convictions that he has are quite like the convictions 
of alarmists who neither have his organized knowledge nor the ability 
effectively to use such organized knowledge. In general terms, his 
convictions can be summarized in some such manner as the following: 
Human traits are inherited. All human inheritance follows the 
Mendelian principles. Intellect can be bred. Genius is born, not 
made. Superior individuals do not reproduce enough of their kind. 
The undesirable are reproducing much too much of their kind. The 
race is deteriorating. It must be saved. Eugenics is the way to 
save it. 

Biases in the selection and interpretation of materials made in 
the book include the following: Books of cultural anthropology are 
almost entirely neglected. Many relevant facts of economics are not 
considered. The Stanford studies on the role of nature and nurture in 
the development of intelligence, with findings to the liking of eugen- 
icists, are made much ado of. On the other hand, the Chicago studies 
on the same topic, with findings not so much to the liking of the 
eugenicists, are not mentioned. Developmental studies in which 
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modification by intrinsic methods are brought about are over-empha- 
sized. Similar studies wherein modifications are brought about by 
extrinsic influences are either under-emphasized or not mentioned. 
Persons interested in the problem of race betterment, regardless 
of bias, will find this a stimulating book. H. MEurTzer. 
Psychological Service Center, St. Louis. 


FLoyp C. Dockreray. General Psychology. New York: Prentice-Hall, 
1932, pp. X XI + 581. 


GARDNER Morpuy. General Psychology. New York: Harper and 
Brothers, 1933, pp. X + 657. 


Both of these are excellent books and will be more than satisfactory 
as texts. 

The distinguishing virtue, or defect, of the Dockeray book is that 
it is written definitely from the Behavioristic point of view. The 
dedication is to 4. P. Weiss, and the work embodies his genetic 
approach. After a discussion of the place of psychology among the 
sciences, there is an account of the evolution of the nervous system 
in the lower animals. This is followed by a description of infant 
behavior. Emotion is considered to be disorganized response. This 
theory prevents the differentiation of clear cut patterns for the specific 
types of emotion. Perception and attention are treated together 
in the chapter on organized responses. A possible influence of Gestalt 
psychology may be seen in the fact that this chapter precedes the 
one on sensation. With canny discretion the author avoids presenting 
any list of instincts, or even prepotent reflexes, and declines to hazard 
a color theory. Kdéhler’s experiments on ape learning are reviewed, 
but the insight theory is of course denied. There is a good chapter 
on intelligence tests denoted “Levels of Attainment.” In all, the 
subject-matter of the beginning course is everywhere adequately 
presented. 

The Murphy book is distinguished in that it does not represent any 
one school. ‘All aspects of the subject are treated with admirable 
impartiality. At times the influence of Gestalt is noticeable, and the 
author is not afraid to mention various philosophical problems which 
are here and there suggested. Perhaps this departmental antagonism 
is finally being outgrown. However, the book is more thoroughly 
experimental and less speculative than are most other texts, especially 
some of the behavioristic ones. The outline of contents is startling 
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in some of its innovations, but all of the traditional bits of subject- 
matter are presented somewhere or other as well as many which 
have been hitherto neglected. This amazing comprehensiveness 
includes material on the history of psychology, abnormal, industrial, 
social, and child psychology, experimental aesthetics, tests, statistics, 
genetics, the psychology of personality, and recent experiments on 
thought. The book is longer than the average text, and relatively 
less attention is given to neurology and sensation. There are many 
new illustrations and graphs. The material will be difficult for 
beginners. The author seems at times to be writing for readers who 
are already familiar with the mass of research and controversy back 
of the summary which is presented. In other places the student 
suddenly meets with references to the pituitary gland or the autonomic 
nervous system. The reply may, however, be made that it is precisely 
the function of the instructor to supplement the text in difficult places, 
and that it is better to introduce physiology piecemeal as this may be 
needed. The book will serve admirably as a text for a year course 
with students of a certain intellectual maturity. ME vIn Riaa. 
Kenyon College. 
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